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The Intel Influence 


FOCUSED. DRIVEN. 
MOTIVATED. 


It's Intel. And 
the high achieving 
individuals who 
thrive and succeed 
in our energy- 
charged 

atmosphere. But 
the results have paid off, with one innovation 
after another. Intel has become much more than just 
another semiconductor company...a leading 
microcomputer company offering opportunities at 
components and systems levels. 

Changing history through innovations in the 
industry has long been Intel's strength. We are 
currently seeking professionals for the following areas: 

• Operating Systems 

• Systems Architecture 

• Architecture Simulation 

• I/O Subsystems (disk, network, etc.) 

• Compilers 

• Benchmarking 

• Competitive Analysis 

• Technical Marketing 

• Hardware Systems Design 

• Multiprocessing 

• PCAT-Architecture 

• VLSI-Logic/Circuit Design 

We also have positions available for Senior 
Technical Staff Members in the following areas: 

• System Architecture and future 
directions. 

• CPU/Microprocessors 

• I/O Architectures and future 
components. 



• Plan, manage, implement and 
operate a lab for benchmarking 
systems from Intel and other vendors. 

• Manage a small group that will 
develop a strategy and drive 
implementation of tools to simulate 
complete/partial systems. 

VLSI Design - Logic & Circuit 

Requires 3 plus years industrial experience on 
microprocessors, peripherals or memories. 

Compiler Software Engineer 

Responsible for generating compiler code and 
utilizing experience with RISC architectures. 

Technical Marketing Engineers 

Will target customers for the i860™ microprocessor 
or i486™ microprocessor. Requires workstation, 
minicomputer, graphics systems, or PCAT architecture 
expertise. 

All positions require extensive experience and a 
graduate degree, MSEE or CS required. 


If you'd like to learn more about the positives of 
being focused, driven, motivated, we'd like to hear 
from you. For immediate consideration, please call 
Rebecca at (408) 765-1546 or send your resume to: 
Intel Corporation, Dept. S396, P.O. Box 58119, 
Santa Clara, CA 95052-8119. 
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From the 
Editor-in-Chief 


This issue 


T his October issue of IEEE Micro 
discusses a wide variety of topics. 
While it carries no central theme, 
future issues will. Our European edition 
this December will present neural net¬ 
work articles concerning the current 
research on this form of parallel 
computing. The February 1990 issue 
will be dedicated to several of the best 
presentations given at the recently held 
Hot Chips Conference at Stanford 
University. 

The first article in the current issue 
presents N. Jagadish, J. Mohan Kumar, 
and L.M. Patnaik discussing a procedure 
that uses dual-ported RAMs to help mul¬ 
tiple processors communicate. The 
scheme presented has a large bandwidth 
and minimal software and hardware 
costs as well as being reliable. Next, 

Tho Le-Ngoc and Minh Tue Vo discuss 
implementation of the fast Hartley trans¬ 
form, which deals efficiently with real 
data. After evaluating FHT execution 
times and memory requirements, the au¬ 
thors compare these parameters with the 


fast Fourier transform for both hardware 
and software implementations. 

Author Walt Price discusses bench¬ 
mark tools, their use, and the results of 
using the benchmarks on various well- 
known computer systems. An-Chi Liu 
and Ranjani Parthasarathi look at the 
problem of monitoring a multiple micro¬ 
processor system with available hard¬ 
ware. Monitoring the performance of 
complex systems is important because 
performance can be tuned to make the 
system more efficient and cost effective. 
Monitoring a multiple processor system 
also tells the user the overall perform¬ 
ance characteristics of the system, thus 
allowing the system to be configured in 
an optimal manner. 

David Franklin and David Ostler pre¬ 
sent their work with the proposed IEEE 
standard (IEEE PI073) for interfacing 
medical devices to a computer. P1073 
differs from most interfacing and local 
area network standards due to the envi¬ 
ronment in which the medical instru¬ 
mentation must function. 



Because computer design, develop¬ 
ment, and synthesis are near and dear to 
most Micro readers’ hearts, we offer 
an article on the subject by William 
Birmingham, Anurag Gupta, and Daniel 
Siewiorek. The Micon system synthe¬ 
sizes computer systems from higher 
level specifications. The Micon set of 
programs supports hardware system de¬ 
sign. Its very sophisticated software 
makes use of artificial intelligence to 
achieve the desired synthesis. 

We add to the considerable amount of 
research and development work going 
on in the area of neural networks with 
our article by Mark Walker, Paul 
Hassler, and Lex Akers. These authors 
discuss the problems encountered in de¬ 
signing neural networks and the meth¬ 
ods they chose to circumvent these 
problems. Their design of an adaptive 
CMOS neural array proves interesting. 

Also with this issue Carl Warren joins 
the Editorial Board of IEEE Micro. Carl 
works for McDonnell-Douglas and has 
been active for many years in the IEEE 
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From theEIC 


In the mailbag 

December 

“I liked the DSP articles, very 
readable indeed.” W.F.D., Southamp¬ 
ton, England 

“I would like to see in-depth ar¬ 
ticles on the field bus, Bitbus, CAN, 
Profibus, MIL-1553, FIP, and ISI 
bus.” R.T.S., Ranchi, India (As I 
mentioned earlier, we will soon be 
adding a Micro Standards department 
written by an editor with an interest 
in standards and buses. He will be on 
board soon.—J.H.) 

“I liked the total issue. That was 
an excellent presentation of DSP. I 
would like to see [articles on] buses 
(Futurebus) and related products 
(special chips and boards).” B.C., 
Zagreb, Yugoslavia 
“I liked the whole magazine. Good 
job.” M.S.S., CAFB, MS 

“I would like to see Bitbus, MIL- 
1553, Mini-Map, and CAN field 
buses compared, and their supporting 
chips.” P.R., Tampere, Finland 
“I liked Micro Review. I would 
like to see more materials on com¬ 
mercial software and hardware.” 

F.F., Tehran 


February 

“I liked the robot-arm controller 
article. I would like to see more on 
commercial software packages.” 
F.F., Tehran 

“I would like to see articles on the 
i860. Next, and i486.” V.R.M., Ban¬ 


standards committees and in the techni¬ 
cal publications business. He plans to 
revive our Micro Standards department 
in the near future. 

Three members of the board—Dave 
Gustavson, Ken Majithia, and Shmuel 
Ben-Yaakov—leave the board. These 
three men have served Micro for four 
years (two terms) and cannot be reap¬ 
pointed. Dave Gustavson has been one 
of our prime article reviewers, contrib¬ 
uting many ideas and supporting many 
of the special issues that we have pub¬ 
lished. Ken Majithia, another excellent 
reviewer, also served in the past as our 


galore, India (Adetailed article on 
the i860 appeared in the August issue 
of Micro, and we briefly discussed 
the Next system in June’s Micro 
View column. We hope to have an ar¬ 
ticle on the 486 in the February Hot 
Chips issue.—J.H.) 

“I liked the Prolog coprocessor, 
the MARS approach, and EMMA2, 

A High-Performance Hierarchical 
Multiprocessor.” U.S.S., Lima 

“I liked virtually all the content. I 
would like to see articles on micro¬ 
processor applications to command 
and control systems and more on 
robotics.” M.E.A., Jeddah, Saudi 
Arabia 

April 

“I would like to see more simple 
ideas on hardwired-logic simulation 
and industrial controllers.” L.P., 
Recife, Brazil 

“I liked the Micro Law article. Ar¬ 
ticles like this are very informative.” 
R.C., Chittenango, NY 

“I would like to see more of those 
technical reviews (Motorola 88000). 
Why [present] only the microproces¬ 
sor and not the MMU (memory man¬ 
agement)?” M.L., Grimbergen, Bel¬ 
gium (We have a manuscript in the 
review process that covers external 
MMU chips. We hope to have the re¬ 
view completed shortly.—J.H.) 

“I liked the article on the Motorola 
88000.1 would like to see an article 
on the Intel N-10.” A.L.B., Bulacan, 


New Products editor. Shmuel Ben- 
Yaakov provided reviews for the last 
four years. His comments and selection 
of other reviewers enhanced the quality 
of the articles and the magazine. I wish 
to thank these three men for their contri¬ 
butions to the excellence of Micro and 
wish them the best. They will be missed. 

Just recently, I was discussing Micro 
with one of its founders. He pointed out 
that the original concepts for the maga¬ 
zine included the exchange of software 
and ideas on the efficient use of existing 
software. Since I have been Editor-in- 
Chief, I have seen several full-length ar- 


Philippines (See our August issue; 
Intel now calls it the i860.—J.H.) 

“See the interest codes! How about 
[supplying] pricing information on 
microprocessors reviewed? The edi¬ 
tor says that a database management 
system is a software tool for micro¬ 
processor design. I do not agree!” 
P.S., Brentwood, NY (The interest 
codes found at the end of each article 
and department in every issue are im¬ 
portant to us for two reasons. We 
want to know that you have read the 
article, and we want to know what 
you think about what you have read. 
This data helps us select the areas on 
which we should concentrate when 
considering the future editorial con¬ 
tent of Micro. —J.H.) 


June 

“I would like to see some articles 
on network model databases. Rela¬ 
tional is over emphasized. It’s time 
to shift the focus.” S.W., Sherbrooke, 
Canada 

“I would like to see [articles on] 
commercial LAN-based products.” 
T.I., Rochester, NY 

“I liked the special issue on DSP 
[December 1988] very much. I would 
like to see implementations for DSP 
algorithms by using the transputers.” 
P.T., Pittsburgh 

“I liked Letters, Reviews. I dis¬ 
liked articles on TRON.” S.M., North 
York, Canada 


tides on software and its uses. How¬ 
ever, I have not seen any short (letter- 
length) software programs that might be 
of interest to readers. If you have a short 
program that you think might interest 
the readers of Micro, why not send it in? 
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Hubert Kirrmann 

Asea Brown Boveri Research Center 
CRBC.l 

CH-5405 Baden , Switzerland 


Jessi: A project to 
answer the giants 

n the last few years, Europe has ex¬ 
ploded with government-sponsored 
programs for the development of high 
technology. Every firm, university, or 
research institute tries to get its share of 
European Counting Units, or ECUs, 
from a European economic community 
program such as ESPRIT, Eureka, and 
Race (see Table 1). 

Europeans originally regarded these 
programs as answers to US Department 
of Defense projects or to Japan’s Fifth 
Generation Computer project sponsored 
by MITI, the Ministry of International 
Trade and Industry. All of these pro¬ 
grams rely on the belief that if you just 
give scientists enough money and facili¬ 
ties they will come up with productive 
inventions to boost the country’s na¬ 
tional gross product. 

In practice, things look a little differ¬ 
ent. European community money does 
not flow liberally. ESPRIT grants do not 
exceed 50 percent of project costs—and 
they are usually much lower. For Eureka 
(Mitterand’s European Research Initia¬ 
tive), the Brussels administrative office 
pays solely for meetings. All programs 
require that firms and research institutes 
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Table 1. 



European programs in information and communication technology. 




Budget 



No. of 

in ECUs* 

Name 

Topics 

Projects 

(millions) 

Aim(1988) 

Application of informatics to 
medicine 

43 

20 

Cost (1971) 

Telecommunications, 
transportation, oceanography, 
new materials, environmental 
protection, meteorology, 
agriculture 

37 

250 

Delta (1988) 

Application of informatics to 
teaching 

30 

20 

Drive (1988) 

Traffic control, automotive 
electronics 

61 

60 

ESPRIT I (1983) 

Microelectronics, software 
engineering, office automation, 
computer-integrated 
manufacturing 

226 

750 

ESPRIT 11 (1987) 

Continuation of ESPRIT I, basic 
research, artificial intelligence, 
computer science 

158 

1,600 

Eureka (1985) 

Informatics, telecommunication, 
computer-integrated 
manufacturing, robotics, new 
materials, lasers, biological 
techniques, environmental 
protection, energy, and 
transportation 

297 

6,400 

Jessi (1987) 

Microelectronics, new 
technologies, applications, 
manufacturing, basic research 


3,800 

Race (1987) 

Technological support for 
telecommunications 

94 

550 

*An ECU is worth about $1 in US currency. 























collaborate towards a common goal. 

This approach requires a steering com¬ 
mittee and kilometers of red tape. Col¬ 
laboration within a group is shadowed 
by suspicions that other members will 
benefit more from the total outcome. (If 
it really is a profitable research subject, 
why should we share it with them ?) 

Researchers complain that they spend 
their time filling out research proposals, 
attending boring meetings in the ill- 
ventilated rooms of Brussels’ 
Berlaymond building, and writing 
reports. Some even pretend that the 
contribution of Brussels is more than 
offset by the additional administrative 
work it causes. People are beginning to 
realize more than ever that money alone 
is not enough. Clever minds cannot be 
generated on the spot by cash. Finally, 
control is weak: Once the project has 
been approved. Brussels says little about 
the results. 

So how do programs succeed? The 
winners appear to contain participants 
that would have collaborated anyhow. 
The programs just become an opportu¬ 
nity to pocket some additional ECUs. 
Also, a clear goal with a tangible result, 
like shooting a space shuttle into orbit 
before 1995, characterizes successful 
programs. Finally, successful programs 
require the members to be personally 
compatible. Members must be friends, 
or programs don’t work. 

All things considered, the Jessi (Joint 
European Submicron Silicon) program 
of Eureka seems to be on the right track. 
The electronics giants of Europe—Phil¬ 
ips of the Netherlands, CGS-Thomson 
of France, and Siemens of Germany, 
among others—have joined forces to 
develop the technology for a 64-mega- 
bit, 0.3-micrometer RAM before 1996. 
The European community agreed in 
June to support this program with 4 bil¬ 
lion dollars. Jessi answers Sematech, a 
semiconductor manufacturing consor¬ 
tium in the US made up of its own 
giants. Jessi also is Europe’s retort to 
the huge spending of statelike Japanese 
firms. 

In the meantime, seven leading US 
manufacturers have cofounded US 
Memories, Inc., as a response to Jessi 
and to the Japanese domination of the 
dynamic RAM business. 


T he success of the Jessi program 
also rests on the exploitation of a 
huge silicon market that is grow¬ 
ing at 14 percent per year, according to 
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Table 2. 

Summary of Jessi goals. 


Time periods 

Specifications 

1990-1992 

1993-1995 

Structure width 

0.5 micrometers 

0.4-0.3 micrometers 

Chip area 

200 millimeters 2 

500 millimeters 2 

Memory elements/chip 

40,000,000 

200,000,000 

Logic elements/chip 

2,000,000 

10,000,000 


Jessi’s bureau. Analysts expect the mar¬ 
ket to reach 160-billion US dollars by 
the year 2000, with a 28-billion US-dol- 
lar share for Europe. The estimates for 
the 1992 European market for silicon 
reach 10 million US dollars. Europe 
could secure millions of jobs in this area 
and associated domains. 

The Siemens-Philips team has previ¬ 
ously shown its combined strength in 
the Mega-project. Siemens developed a 
1-Mbit DRAM, while Philips produced 
a 256-kilobit static RAM (both in 1.2- 
micrometer versions.) They used the 
same technology—a static RAM just 
takes four times more space than a dy¬ 
namic one. Siemens began production 
this year in Regensburg, Germany, be¬ 
coming (after IBM) the second company 
to manufacture 1-Mbit DRAMs outside 
of Japan. IBM, in the meantime, had al¬ 
ready started its 4-Mbit DRAM produc¬ 
tion in Sindelfingen, Germany (luring 
Chancellor Helmut Kohl into the 
cleanroom as a special public-relations 
attraction). 

The Jessi project also explicitly re¬ 
sponds to European fears that the con¬ 
trol of this strategic technology could 
escape them. Two thirds of all ICs in 
European products come from either the 
US or Japan. The US could stop exports 
to Europe for national security reasons, 
while Japan could stop for economic 
reasons. The oil crisis of 1973 could one 
day be followed by a silicon crisis. Al¬ 
ready, agreements between Western 
countries prevent exporting products 
that contain advanced embedded ICs of 
US origin to certain countries. 

Technical, financial, and emotional 
motivation has translated into a strong 
administrative framework for Jessi. The 
team worked out its first comprehensive 
plan at the Frauenhofer Institute for Mi¬ 
crostructure Techniques in August 1988. 
They divided the work into several key 
areas: technology, equipment and mate¬ 


rials, application (design tools), and ba¬ 
sic research. The administrative effort 
has been impressive. A 1,200-page 
green book defines the project goals, 
which are summarized in Table 2. 

Technology projects focus on master¬ 
ing submicrometer production tech¬ 
niques. Equipment projects should help 
overcome Europe’s traditional weak¬ 
nesses in cleanrooms, wafers, litho¬ 
graphic equipment, encapsulation, and 
test procedures. Application projects in¬ 
duce future application-specific ICs. 
Memories provide the first step in mas¬ 
tering the technology, but in the long 
run, custom ICs should become the key 
market. A unified silicon language 
should allow clients and silicon foun¬ 
dries to speak the same language, yet 
preserve user know-how from the prying 
eyes of the silicon maker. Advanced 
CAD tools, design verifiers, and silicon 
compilers prepare this phase. Finally, 
basic research projects should show the 
way beyond current physical barriers. 

Another element that promotes Jessi’s 
chances of success is that industrial 
firms share in its leadership (academia 
is virtually nonpresent). The governing 
board resides firmly in the hands of the 
founding firms. In fact, a retired director 
of Philips heads the board. Founding 


The success of 
the Jessi program 
rests on the 
exploitation 
of a huge 
silicon market. 











Department 


members of the governing board also 
benefit from early review of project pro¬ 
posals. This advantage provides a mar¬ 
ket-analysis tool that allows frequent 
exposure to good ideas. 

T he Jessi project invites other firms 
and countries outside of the 
European economic community to 
join the club—provided they bring their 
own money along. Non-European 
partners are welcome under two con¬ 
ditions. They must conduct research, 
development, and production in Europe, 
and the members must reciprocate in 
this collaboration. 

While IBM easily fulfills the first 
condition, it does not meet the second. 
Europeans have no access to US Memo¬ 
ries or Sematech. It is likely, however, 
that a common ground will result from 
the negotiations this fall between Semat¬ 
ech and Jessi. IBM’s experience in the 
field of producing 4-Mbit DRAMs could 
outweigh the “reciprocal” restriction of 
the Europeans. US Memories may also 
open its doors to Europeans. Its goals 
seem similar to Jessi’s, an experienced 
IBM manager leads the firm, and IBM’s 
4-Mbit DRAM will most likely emerge 
as its first product. 

Sematech faces a similar dilemma, 
with the Japanese NEC Corporation 
knocking at the door—represented by its 
US subsidiary, of course. 

Those who are interested in learning 
more about Jessi may contact the or¬ 
ganization by writing the Jessi Planning 
Group, Margarete-Steiff-Weg, 3D-2210 
Itzehoe, Federal Republic of Germany. 


Reader Interest Survey 

Indicate your interest in this department 
by circling the appropriate number on 
the Reader Interest Card. 

Low 174 Medium 175 High 176 


Letters 


Macintosh issues welcomed 


To the Editor: 

My initial reaction to the August 
issue of IEEE Micro was to return the 
reader’s card to you with my com¬ 
ments appended. Subsequently, I sur¬ 
mised that writing you my comments 
in this short note would be more 
appropriate. 

First, I would like to register my in¬ 
terest as a reader of two departments, 
Micro News and Micro Review, as 
numbers 179 and 185, respectively. 

With regard to the item in Micro 
News, “IEEE Standards to launch hy¬ 
pertext series,” I will place it on a lo¬ 
cal Washington, D.C., Macintosh BBS 
of which I am a member. I believe that 
there are interested members who may 
want to become involved in this effort 
from a Macintosh viewpoint. 

My other comments pertain to the 
“A new Macintosh environment” item 
written in Micro Review. A very 
important matter that the reviewer of 
the Wingz application did not discuss 
was the configuration of his Mac Plus 
computer. If the reviewer had system 
version 6.0.2 installed in his Mac Plus, 
coupled with the later market version 
of the Mac Plus with its small 
computer system interface (SCSI), I 
would be very surprised if he would 
have encountered all the problems that 
he alluded to in his comments. I have 
used Wingz routinely since it came on 
the market and I have nothing but 
accolades for this application. There 
are some user-friendly options that 
could be added to Wingz to make it 
more efficient; nevertheless, compared 
to Excel, Wingz is a heads-and- 
shoulders performer. 

I take the liberty of enclosing a re¬ 
view of Wingz performed by Frank 


Potter (with his permission). Potter 
reviews Macintosh products for the 
Washington, D.C., Metro Apple Users 
Group (Washington Apple Pi). The re¬ 
view I have enclosed was his initial cut 
at reviewing Wingz and I extracted it 
from the local Macintosh BBS. [Limited 
space keeps us from reprinting this re¬ 
view. Interested readers should extract 
it from the BBS. — Ed.] 

Keep up the good work. I particularly 
enjoy reading about important issues 
pertaining to Apple Macintosh products, 
such as those involving the IEEE stan¬ 
dards for hypertext. 

Louis F. West 

Fairfax, VA 


Richard Mateosian, Micro Review 
editor, replies: 

I have system version 6.0.3 installed, 
and my Mac Plus has a SCSI interface. I 
purchased it in January 1987. 

Upon reviewing my column, the only 
even vaguely negative things I said 
about Wingz on my Mac Plus were: “Its 
fancy graphics were painfully slow on 
that machine.” “I did not care for the 
Took and feel’ of Wingz nearly as much 
as that of Excel.” 

I’m sure that if Louis West will run 
the Wingz demo program on a Mac 
Plus, he will find it just as painfully 
slow as I did. The special graphics fea¬ 
tures designed to take advantage of an 
FPU simply cannot run very fast on an 
unadorned 68000. As to my personal 
feelings about the Took and feel’ of 
Wingz, perhaps I should have kept them 
to myself. I readily admit that such 
judgments are subjective. 
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Micro Law 


Richard H. Stern 
Law Offices of Richard H. Stern 
1300 19th Street NW, Suite 300 
Washington, DC 20036 

Appropriate and inappropriate legal protection of user 
interfaces and screen displays 

Part 3, Copyright law, the courts, and the Copyright Office 


C oncerns similar to those discussed 
in the two preceding issues of this 
magazine—about how copyrights 
might hinder software progress more 
than they encourage it—attend any grant 
of intellectual property protection. The 
concerns are legitimate, but on balance 
they are not determinative. There are 
also strong reasons for according intel¬ 
lectual property protection to techno¬ 
logical advances. Protection provides an 
appropriate level of incentive to inno¬ 
vate. Only a calculus of potential bene¬ 
fits and harms, showing whether the 


benefits weigh less or more than the 
harms, can be determinative. 

The patent and copyright laws are au¬ 
thorized by the patent and copyright 
clause (Article I, Sec. 8, Cl. 8) of the 
United States Constitution. It gives the 
US Congress power to promote the 
progress of science and useful arts by 
securing to authors and inventors, for 
limited times, the right to exclude others 
from exploiting the authors’ writings 
and the inventors’ inventions. 

The US Supreme Court has held these 
words to be both a grant of power to 


Congress and a limitation on congres¬ 
sional power. By negative implication 
from the grant, Congress may not enact 
intellectual property laws that will hin¬ 
der the progress of science and useful 
arts. Congress may not grant perpetual 
patents or copyrights, and it may not 
withdraw from the public things already 
in the public domain. The patent and 
copyright laws therefore exist to pro¬ 
mote the progress of human knowledge. 
That is their raison d’etre. 

The copyright laws apply to computer 
programs. Until 1980, that they applied 


Series Highlights 


User interfaces for computer pro¬ 
grams are often significant in deter¬ 
mining how easy it is to learn to use 
and how efficiently users can utilize 
the programs. They are therefore sig¬ 
nificant factors in the commercial 
success of computer software. 

Should the law protect screen dis¬ 
plays and other aspects of user inter¬ 
faces from competitive imitation? If 
so, what is the best mechanism to do 
so? 

The US Copyright Office recently 
ruled that screen displays should be 
protected, but only as an “integrally 
related” part of the code of the com¬ 
puter programs to which they relate. 
The Computer Society had urged the 
Office to protect screen displays 
separately and independently from 
the code of the computer programs 
associated with them. A court then 
adopted a “legal fiction” that the 
Copyright Office may allow only one 


registration for the combination of code 
and screen displays, but the court would 
treat the registration as if it were two 
registrations, one for the code and one 
for the screen displays. 

The main question posed by legal pro¬ 
tection of screen displays and related 
aspects of user interfaces concerns two 
issues. One is the importance of keeping 
the teachings of the science of human 
factors analysis (human-computer inter¬ 
action) freely available to workers in 
the field. Just as important is the need 
to provide incentives to creators and in¬ 
vestors in innovative screen display 
technology. 

Some court decisions and some ag¬ 
gressive litigation positions of software 
publishers have raised legitimate indus¬ 
try concerns. People wonder whether 
copyright protections for screen displays 
will be used to lock up utilitarian fea¬ 
tures of screen design, thereby impover¬ 
ishing the toolbox of other workers in 


the screen design field. 

The concerns are most acute in the 
case of very important or necessary 
techniques, such as those dictated by 
human factors analysis. But concerns 
may also be appropriate about screen 
design conventions and emerging 
conventions. Moreover, these con¬ 
cerns extend beyond screen display 
expedients to keystroke conventions 
such as </> (slash ) for invocation 
of command mode. 

A very troubling question is 
whether user habituation to particular 
user interface or screen display expe¬ 
dients—resulting from users’ invest¬ 
ments in self-education—should give 
rise to a legally protected interest. 

Part 3 of this series addresses how 
the courts and Copyright Office are 
dealing with concerns of this type, 
after providing a brief copyright law 
tutorial. 
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at all to computer programs was ques¬ 
tionable. Congress then passed an 
amendment to the Copyright Act. In ef¬ 
fect Congress extended its protections to 
“a set of statements or instructions to be 
used directly or indirectly in a computer 
in order to bring about a certain result.” 

Initially, the courts disagreed over 
whether these words covered object 
code as well as source code and whether 
they covered “ROMmed” code (com¬ 
puter program code stored in a semicon¬ 
ductor chip). But eventually the courts, 
in effect, interpreted the statutory terms 
used directly to mean machine language 
and used indirectly to mean code that 
had to be interpreted, compiled, or as¬ 
sembled into machine language to be 
used in a computer. Moreover, “ROM¬ 
med” code was held to be protected un¬ 
der the copyright law. 

The patent laws also apply to com¬ 
puter programs, at least when the pro¬ 
gram is associated with a machine sys¬ 
tem that does more than just compute 
numbers. As a practical matter, how¬ 
ever, the requirement of technical merit 
(“nonobviousness”), the $5,000 to 
$15,000 cost, and the three-year time 
delay involved in issuance of an elec¬ 
tronic-equipment patent have so far 
tended to make the patent system of 
little importance for screen displays. (In 
the last several years, interest has grown 
in design patents on aspects of screen 
displays, such as icons. Design patents 
protect ornamental, nonfunctional as¬ 
pects of products. They usually issue in 
less than a year and cost approximately 
$1,000. The validity and scope of design 
patents on screen displays are yet to be 
tested in the courts.) 

Because patent protection on screen 
displays is as yet untested and uncertain, 
this series focuses on copyright aspects 
of intellectual property protection of 
screen displays, to the exclusion of 
other forms of intellectual property. 

Copyrights. The copyright laws seek 
to promote human knowledge by provid¬ 
ing an economic incentive to those who 
create new works of authorship or who 
subsidize such creation by their invest¬ 
ment in creation. (Subsidies include 
paying the salaries of programmers and 
designers of user interfaces for soft¬ 
ware.) The owner of a copyright re¬ 
ceives the legal right to exclude all un¬ 
authorized persons from reproducing 
and marketing the copyrighted work. 

In theory, the right to exclude applies 
only against those who copy the work, 


as contrasted with those who independ¬ 
ently create the work without ever hav¬ 
ing had access to the protected work. 
The distinction is illusory, however, in 
the case of the screen displays of a 
widely marketed computer program. In 
this case everyone in the trade will 
know about the computer program and 
thus will have had access to its screen 
displays. Independent creation is at best 
a theoretical possibility. 

The owner’s exclusive right is called 
a monopoly if you are suspicious of it, 


Paying some economic 
rent to creators and 
developers of new 
technology will mobilize 
inventors to their attics, 
engineers to their 
garages, and investors 
to their wallets. 


or a proprietary right or property right if 
you are enthusiastic about it. However 
you characterize the right, it works by 
giving its owner the legal power to exact 
“economic rent” from the exploitation 
of that right. 

Consider for example, disks contain¬ 
ing the Lotus 1-2-3 spreadsheet program 
or the Ashton-Tate dBase III database 
management program, and the accompa¬ 
nying documentation. Such software 
packages might sell for barely more than 
the cost of the blank diskettes and the 
cost of photocopying the manuals (a few 
dollars instead of a few hundred dol¬ 
lars), were it not for the copyright laws. 
The difference between the going mar¬ 
ket price and those costs represents the 
copyright owners’ economic rent. The 
right to this rent compensates them for 
creating the product and putting it on 
the market, and at the same time pro¬ 
vides an incentive to others to become 
similar entrepreneurs. 

This incentive provides the underly¬ 
ing theory or premise of intellectual 
property law, which applies to patent 
law and copyright law alike. The theory 
is that paying some economic rent to 
creators and developers of new technol¬ 
ogy, in the form of a grant of an exclu¬ 
sive right, helps everyone. It will mobi¬ 


lize inventors to their attics, engineers 
to their garages, and investors to their 
wallets to underwrite creative costs. 
Eventually, the public benefits because 
the result will be a greater amount of in¬ 
novation than would occur in the ab¬ 
sence of the system of exclusive rights. 

It is implicit in the theory that the so¬ 
cial cost will be less than the social 
value of the incremental amount of crea¬ 
tion caused by maintaining the system. 
The social cost of this system is the sum 
total of economic rent paid to creators 
and their investors, in the form of copy¬ 
right royalties, higher prices for copy¬ 
righted products, lost value from hinder¬ 
ing advances by others, or whatever. 
While there is little hard empirical evi¬ 
dence one way or the other on this 
point, customary practice by Congress 
and the courts has made this premise a 
fixture of intellectual property law. It is 
beyond the scope of this series to sub¬ 
ject that premise to further or more rig¬ 
orous analysis. 

Enforcement of copyrights. A US 

copyright owner may enforce its copy¬ 
right only after registering the work 
with the US Copyright Office. Registra¬ 
tion involves submitting a completed 
copyright application form to the Copy¬ 
right Office, providing it with a copy of 
the work or of material identifying it, 
and paying a small fee (ordinarily $10). 
Within a few months the Office will ex¬ 
amine the material for copyrightability, 
in a limited way without the kind of in¬ 
quiry given a patent application. It will 
then issue a certificate of registration of 
a claim of copyright. The copyright 
owner may then sue any copyright in¬ 
fringers in federal court. An infringer is 
a person who reproduces or distributes 
the work without authorization from the 
copyright owner—or, in copyright own¬ 
ers’ preferred parlance, a pirate. 

If a judgment of copyright infringe¬ 
ment is entered against a pirate, the 
copyright owner may seek an injunction 
against continuation of the infringement, 
damages for lost sales, and an award of 
the pirate’s ill-gained profits. The copy¬ 
right owner may also secure an order for 
the destruction of infringing goods and 
for destruction of equipment used to 
commit the copyright infringement, at¬ 
torney fees, and other relief. 

An intentional copyright infringe¬ 
ment, committed for purposes of com¬ 
mercial gain, may subject the infringer 

continued on p. 75 
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Need a simple, 
fast, modular 
system to 
transfer 
messages in a 
multiprocessor? 
Try this low- 
cost duplex 
scheme. 


N. Jagadish 
J. Mohan Kumar 
L.M. Patnaik 

Indian Institute of 
Science 


S uccessful interprocessor communication, a key factor in the design 
of any multiprocessing system, requires high bandwidth and relia¬ 
bility with minimal cost and software/hardware overheads. Here, 
we present such a communication scheme. It features simplicity, 
speed, modularity, and configurability to multiprocessing systems such as 
linear arrays, triangular arrays, meshes, systolic trees, and hypercubes. 

Communication between any two processors in this scheme takes place 
through a common memory, independently accessible by both processors 
involved. The interprocessor interconnection scheme in a multiprocessor 
system directly affects system throughput and has a bearing on the modu¬ 
larity, reliability, and overall system performance. Yalamanchili and Ag- 
garwal discussed the importance of the processor interconnection scheme 
when they characterized the capabilities of a multiprocessing system. 1 

Various interconnection schemes have been suggested for message pass¬ 
ing between processor nodes. 2,3 Tuazon et al. 2 suggested a scheme that 
makes use of first-in, first-out, or FIFO, buffers and several communication 
channels. Their scheme involves data-shifting mechanisms and software for 
polling signals. In this scheme the transfer of a message between two pro¬ 
cessor nodes involves 1) transferring a message to the FIFO buffers in the 
source node, 2) converting a message from words to nibbles, 3) transmitting 
a message from source node to destination node, 4) reconverting the 
message from nibbles to words in the destination node, and 5) receiving 
message data from the FIFO buffer in the destination node after checking 
data-valid flags. 

In the Hayes et al. scheme, 3 processor nodes communicate with one 
another by means of asynchronous direct memory access operations. The 
message moves through serial channels. Transmission involves 1) DMA 
transfer from the main memory to a buffer on the processor node, 2) 
conversion of the message into serial format, 3) transmission on a serial 
communication channel, 4) reconversion of the message into parallel 
format, and 5) another DMA transfer from a buffer to the main memory on 
the destination node. 
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The Intel iPSC, 4 a hypercube supercomputer, con¬ 
sists of eight communication channels per node. Intel 
built the Ethernet protocol-based iPSC using a special 
local communication coprocessor (82586). The iPSC 
scheme transfers data at the rate of 10 Mbits per second. 
Carrier Sense Multiple Access with Collision Detec¬ 
tion (CSMA/CD), a statistical medium access control 
system, implements the sharing of common channels. 

Our scheme for interprocessor interconnection using 
dual-ported RAMs and network controllers follows. In 
this scheme, communication between the processor 
nodes involves writing into and reading from a com¬ 
mon memory area. The communicating processors do 
not have to contend for a common bus as in the case of 
shared-memory systems, since they have independent 
access to the common memory units shared between 
them. Only the memory access time of the processors 
limits the communication speed. Processor-to-pro- 
cessor communication does not use intermediate buff¬ 
ers, input/output ports, or DMAs. We consider the 
example of a three-dimensional cube to illustrate the 
advantages of this scheme. Further, we discuss the 
implementation of the interprocessor communication 
scheme on a 64-node cube configuration. 

Processor-to-processor 

communication 

Dual-ported RAMs now available in VLSI form 
operate at static RAM speeds (50 to 150 nanoseconds) 
and have two independent left and right ports. Figure 1 
illustrates a message transfer between two neighboring 


processors using dual-ported RAMs. This DPR area is 
common to both processor elements (PEI and PE2). In 
other words a portion of the memory space of PEI 
“overlaps” a portion of the memory space of PE2. We 
shall refer to this shared DPR area as the common 
memory. 

Either processor can access the dual-ported RAM 
independently, since this memory area lies in the 
memory space of each processor. However, PEI and 
PE2 access this area with different addresses. On-chip 
arbitration logic within the dual-ported RAM handles 
address contention to ensure maximum speed. In case 
of contention one of the ports must wait until the other 
port’s access is complete; a BUSY signal on the dual- 
ported RAM indicates contention. 

As shown in Figure 1, the common memory between 
PEI and PE2 is logically divided into upper and lower 
halves. PEI writes into the upper half and reads from 
the lower half. Similarly, PE2 writes into the lower half 
and reads from the upper half. In this way we minimize 
the probability of access contention. To transmit a 
message packet to PE2, PEI writes the message packet 
in the common memory it shares with PE2. Communi¬ 
cation between neighboring processors does not in¬ 
volve intermediate devices. A message written by the 
transmitting processor in its own memory is accessible 
to the receiving processor. 

Communication between noncontiguous nodes 
(those not directly connected with each other) can be 
carried out with the help of an intermediate processor. 2 
Figure 2 illustrates the methodology for communica¬ 
tion between noncontiguous processors. A network 
controller (NC) becomes the intermediate link for 
message transfer between noncontiguous nodes. (We 
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Figure 1. Two processors sharing a dual-ported RAM. 


Figure 2. Communication pathways among noncon¬ 
tiguous processors. 
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Figure 3. A 3D cube (a) and a 3D cube with a network 
controller (b). 


show the usefulness of the network controller in hyper¬ 
cube configurations in Figure 3 and discuss it later.) 

In Figure 2, two noncontiguous processors PEI and 
PE3 share separate dual-ported RAMs with the network 
controller. This controller provides an alternative path 
for the transfer of messages between two noncontigu¬ 
ous nodes. As indicated in Figure 2, PEI can transfer 
messages to PE3 along two alternative paths: one path 


via PE2, which involves two memory transfers through 
PE2, and another path through the network controller. 
In the first method, PE2 participates in the message 
transfer between PEI and PE3. 

In a multiprocessing system dedicated PEs perform 
subtasks of a main task. The throughput of the multi¬ 
processing system would be significantly reduced if the 
PEs were used for communication purposes. We in¬ 
clude the network controller seen in Figure 2 to trans¬ 
fer messages between noncontiguous nodes. To trans¬ 
mit a message packet from PEI to PE3, PEI writes the 
message packet into the common memory shared with 
the network controller; this controller performs a block 
transfer to shift the message packet to the common 
memory shared with PE3. 

With this kind of design, the PEs in a multiprocessing 
system need not participate in the communication be¬ 
tween noncontiguous nodes as the network controller 
exclusively performs communication tasks. Obviously, 
processor-to-processor transfer is most effective in the 
case of contiguous nodes. Message transfers between 
noncontiguous nodes must use the path through the 
network controller. Thus the two paths for message 
transfers complement each other. 


Implementation on a 3D cube 

To understand the implementation aspects of our 
scheme, we suggest a three-dimensional cube, since 
such a topology has attracted wide interest among 
researchers in recent years. An /j-dimensional hyper¬ 
cube 5 is a multiprocessor characterized by the presence 
of N = 2" processors interconnected as an n-dimen- 
sional binary cube. Each node of the cube consists of a 
central processing unit and local main memory. Each 
PE of the cube directly communicates to n other PEs of 
the cube; the communication paths correspond to the 
edges of the cube. The length of the path between any 
two nodes is simply the number of edges of the path. 
The minimum distance between any two nodes in an n- 
cube equals the Hamming distance between them. 

Implementing the interconnection network on a 3D 
cube occurs as follows. The nodes of the cube are 
numbered as indicated in Figure 3a and b. Each node 
consists of a PE, which includes the numeric data 
coprocessor. Each PE shares common memory units 
with other PEs located at a Hamming distance of one. 
In addition, the PEs share common memory units with 
the network controller, as illustrated in Figure 4a. 
Figure 4b shows the memory map of a typical node 
processor. 

As can be seen in Figure 4c, the processing node 
contains an 8088 processor with an 8087 numeric data 
coprocessor, address decoding logic, a wait-state gen¬ 
erator, system ROM, local RAM, and dual-ported 
RAMs. The address decoding logic selects among the 
system ROM, local RAM, and the dual-ported RAMs 
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Figure 4. Common memory units at a typical node called PEO (a); a memory map of PEO (b); node processor hard¬ 
ware (c); and a single-wait-state generator (d). 

October 1989 13 



















































































































































































Interprocessor communication 



CC Central controller 


Provision for extension in 
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Figure 5. Common memory units at a network controller (a) and a memory map of the network controller (b). 


shared between the neighboring processors and the 
network controller. The wait-state generator takes care 
of address contention as follows. On detection of a 
BUSY signal from the dual-ported RAM, the wait-state 
generator disables the ready-line input RDY1 of the 
8284 clock generator for one clock state. See Figure 4d. 

The network controller is dedicated to the task of 
overall interprocessor communication management. 
The controller shares common memory units with each 
of the eight nodes in the cube, as illustrated in Figure 5a 
and b, and its hardware configuration is similar to that 
of the nodes. As indicated in Figure 6, the network con¬ 
troller contains parallel and serial ports for communi¬ 
cation with the host system and other input and output 
devices. In addition, the network controller initializes 
the cube and distributes tasks. 

Common memory units exist between all pairs of 
neighboring nodes and between the network controller 
and each node. As mentioned earlier, neighboring 
nodes communicate by directly writing into the com¬ 
mon memory located between the two nodes. For 
communication between nodes located at a Hamming 
distance greater than one, the network controller per¬ 
forms a memory block transfer from the common 
memory shared with the transmitting node to the com¬ 
mon memory shared with the receiving node. A mes¬ 
sage packet between two noncontiguous nodes can also 
be routed through one of the parallel paths between the 


two nodes depending on the availability of the pro¬ 
cessors. 5 The parallel paths between two noncontigu¬ 
ous nodes may consist of one or more nodes that 
contribute to the message transfer by block transferring 
the message packet from the memory space of the 
transmitting node to the memory space of the receiving 
node. 


Message transfer protocol 

The message packet shown in Figure 7a consists of 
the semaphore/address byte, packet-size byte, and the 
actual message. The semaphore/address byte (Figure 
7b) has three subdivisions. The most significant bit 
indicates valid data, the next bit indicates processor- 
busy status, accompanied by three bits for addressing 
processors in an extended hypercube, and the last three 
bits indicate the address of the node. The next byte 
gives the total length of the message in bytes, followed 
by the message itself. The source node checks the 
semaphore bits for data validity and writes the message 
packet either in its common memory shared with the 
destination processor (if the Hamming distance be¬ 
tween them is one), or in its common memory shared 
with the network controller (if the Hamming distance 
between the source and destination nodes is greater 
than one). 
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Figure 6. Block diagram of the multiprocessor system 
with peripherals. 
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Figure 7. A message packet (a) and a semaphore/ad¬ 
dress byte (b). 


As indicated earlier in Figure 3b, PE3 and PE7 (with 
a Hamming distance of one between them) share a 
common memory through which they can communi¬ 
cate. The source processor checks the data valid V-bit. 
If the V-bit is 0, the source processor writes the mes¬ 
sage in the common memory space and sets the V-bit to 
1 to indicate the presence of fresh data. The destination 
processor checks the V-bits in the common memory 
units shared with its neighbors. If any of the V-bits are 
valid, the processor copies into the local memory the 
message from the common memory following a valid 
V-bit, and resets the V-bit to 0. 

The message transfer between two noncontiguous 
nodes involves a path through the memory space of the 
network controller. For example, consider the commu- 


Procedure Initialization for network controller 
for PEO to PE7 
begin 

V-bit := 0; B-bit :^= 0; 
end; 

Procedure Network Controller Block Transfer 

repeat 

for PEO to PE7 
begin 

if ((S-V-bit) = 1) AND ((D-V-bit) = 0) then 
block transfer data; 

S-V-bit := 0; D-V-bit := 1; 

end; 

forever. 

Procedure Send /* for PE */ 

begin 

if hamming distance >1 then 
block transfer data to network controller; 
network controller-V-bit := 1; 
else 

block transfer data to destination; 

D-V-bit := 1; 

end; 

Procedure Receive /* for PE */ 
begin 

if S-V-bit = 1 
then 

block transfer data; 

S-V-bit := 0; 

end; 

/* B-bit is busy bit 

S-V-bit is source data validity bit 
D-V-bit is destination data validity bit 
Network controller V-bit is network controller data 
validity bit */ 


Figure 8. The procedure for a message transfer. 

nication procedure between PE3 and PE4, which are at 
a Hamming distance of three from each other. The 
source processor PE4 determines the Hamming dis¬ 
tance between itself and the destination processor PE3. 
Since the Hamming distance is greater than one, PE4 
writes the message into the common memory space 
shared with the network controller if the corresponding 
V-bit is reset to 0. The network controller checks the V- 
bits of the control bytes in the common memory units 
shared between PE4 and PE3. Say the V-bit of PE4 is 1 
(indicating fresh data) and the V-bit of PE3 is 0 (indi¬ 
cating previous data accepted). In this case, the network 
controller transfers a memory block to shift the data 
stored in the common memory space shared with PE4 to 
the common memory space shared with PE3. The pro¬ 
cedure for message transfer appears in Figure 8. 
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Scheme extended to 64 nodes 

The 64-node extended hypercube, or EH, consists of 
eight 3D cubes 5 and a central controller node, as illus¬ 
trated in Figure 9. (We introduce the EH term to reflect 
that each node of the hypercube is a cube by itself.) 
Each 3D cube consists of eight individual nodes and the 
network controller, and we refer to this group as the 
EH-node (node of the EH). EH-nodes appear at the 
vertices of the EH. Each of the eight EH-nodes has 
topological and architectural features similar to that of 
the 3D cube discussed earlier. 

As indicated by the dotted lines in Figure 9, the eight 
network controllers at the eight EH-nodes form a 3D 
cube, the EH. The network controller of each EH-node 
shares common memory units with its neighboring 
network controllers in the EH. In addition, a central 
network controller shares common memory units with 
all the eight network controllers at the vertices of the 
EH. There is no interconnection network between the 
individual nodes of different EH-nodes. 



Figure 9. An extended hypercube. 
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The interprocessor communication scheme between 
individual nodes of the EH-node is similar to that 
explained earlier, as is the communication scheme 
between any two network controllers in the EH consist¬ 
ing of the network controllers and central controller. 

A message between two individual nodes in different 
EH-nodes transmits via the memory space of the net¬ 
work controllers residing in the EH-nodes of the source 
and destination nodes. A message transfer between two 
individual nodes having a Hamming distance of six 
between them (and residing in two different EH-nodes 
that are themselves at a Hamming distance of three 
from each other) completes with just three memory 
transfer operations. No individual node processor 
(other than the source and destination node processors) 
participates in the memory transfer operation. Memory 
transfers can best be explained by considering two such 
nodes shown in Figure 9, PEO and PE63, as source and 
destination nodes respectively. Messages transfer as 
follows: 

1) PEO writes the message packet in its common 
memory shared with NCO, 

2) NCO transfers the message packet to the common 
memory shared with the central controller, 

3) the central controller performs another memory 
block transfer to shift the message packet to the com¬ 
mon memory shared with NC7, and 

4) NC7 performs a final memory transfer to place the 
message packet in the memory space of the destination 
node. 

The SSS bits in the semaphore/address byte as shown 
in Figure 7b indicate the address of the EH-node. The 
network controllers in each EH-node keep track of the 
busy status of the individual PEs, and the central con¬ 
troller keeps track of the busy status of the network 
controllers. 


Advantages of the scheme 

The dual-ported RAM scheme presents a cost-effec¬ 
tive method for data transfer between processor nodes 
in a multiprocessing system. Tuazon et al. discussed a 
scheme that yields a data transfer rate of 1.5 Mbits/ 
second. Hayes et al. discussed another scheme using 
DMAs and serial channels with a transfer rate of 1 
Mbyte/s. During DMA transfers, though, the processor 
must remain idle until the DMA transfers complete. 
Software overhead may further reduce the effective 
data transfer rate. The CSMA/CD scheme employed by 
the iPSC cube offers a data transfer rate of 10 Mbits/s, 
but has overhead related to the special communication 
coprocessor and its related initialization and control 
software. 

In our scheme, any pair of processors of a hypercube 
can establish two-way communication. A processor 
can receive messages from three of its neighbors and 
































































Table 1. 

Data transfer rates for different processors. 


Clock 

Key 

Overheads Typical transfer rates 

Processor 

(MHz) 

instruction 

(incl. states) 

(Mbytes/s) 

8088 

10 

REP MOVS 

33 

0.58 

8086 

10 

REP MOVS 

33 

1.17 

80286* 

10 

REP MOVS 

19 

5.00 

80386* 

16 

REP MOVS 

21 

16.00 

68000 

12.5 

MOVE.l (al)+,(a2)+ 

50 

1.78 

*In real-address mode 





Table 2. 

Instructions for data transfer. 

Instruction 

Clock states 
required 

MOV AX, DATA SEG 

4 

MOV DS, AX 

2 

MOV AX, EXTRA SEG 

4 

MOV ES, AX 

2 

MOV CX, LENGTH_OF_PACKET 4 

MOV SI, SOURCE POINTER 

4 

MOV DI, DESTN POINTER 

4 

REP MOVS 

9 + 17(n) 

Total no. of clock states 

33 + 17(n) 

n is the number of byte transfers. 



the network controller and send a message to one of its 
three neighbors or the network controller simultane¬ 
ously. In other words, at a given time four communica¬ 
tion paths of a PE can be active. One of these four can 
be a two-way communication path. In an 8-node cube 
with 20 memory units, nine paths can be active at any 
given time. For example, in Figure 3b PE5 can receive 
messages from three of its neighbors (PEI, PE4, PE7) 
and the network controller. PE5 can send a message to 
one of its neighbors, while other PEs (PEO, PE2, PE3, 
PE6) can have four active communication paths among 
them. 

We implemented the dual-ported RAM scheme with 
Intel 8088s as node processors because of the availabil¬ 
ity of hardware/software development tools and the 
hardware’s low cost. The high-speed communication 


technique has advantages in a multiprocessing system. 
The technique can be adopted for communication in 
multiprocessing systems based on advanced micropro¬ 
cessors like Intel’s iAPX 80286 and 80386 and Motor¬ 
ola’s 68000, 68020, and 68030. Table 1 lists expected 
zero-wait-state data transfer rates when using typical 
instructions. This transfer rate is dependent on the bus 
bandwidth and the type of instructions available. With 
an 8088 processor operating at 10 MHz, we obtained a 
zero-wait-state transfer rate of 0.588 Mbytes/s (4.7 
Mbits/s) in both directions (duplex). This transfer rate 
from one PE to another is computed as shown in Table 
2. The REP MOVS string operation essentially achieves 
a block move of data from one part of memory to 
another. 

The software overhead for initializing the various 
registers involves 33 clock states. The transfer rate 
actually depends on the REP MOVS instruction, which 
takes 17 clock states per transfer of a byte in the case of 
the 8088. This, when computed for a processor operat¬ 
ing at 10 MHz, yields 0.588 Mbytes/s or 4.7 Mbits/s. 
The speed improves significantly if processors with 
wider data bus widths and higher clock frequencies are 
used. 

Our fully duplexed, asynchronous, and zero-buff¬ 
ered communication scheme handles messages that are 
less than the maximum allowable packet size. Pro¬ 
cessor nodes operating at different speeds and different 
word lengths could be combined in the same multi¬ 
processor system. The highly optimized dual-port tech¬ 
nique allows the same memory to be used as working 
storage and for communication between nodes, avoid¬ 
ing the need for any special data communication con¬ 
troller. Message transfer is transparent to the user 
programs running on the nodes because no special 
communication channel must be set up and no need 
exists to keep track of packet sequence. We further 
reduce software overhead in that we do not need ac¬ 
knowledgment packets for memory-to-memory trans¬ 
fer. Advanced processors with higher addressing capa- 
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bility can support communication channels with larger 
sizes of dual-ported common memory and hence im¬ 
prove the throughput. 

In an n-cube configuration each individual node 
connects to n neighboring nodes. 5 A message transfer 
operation between any two nodes with a Hamming 
distance of n involves (n - 1) processor elements and 
transmission on n links. We discussed the implementa¬ 
tion aspects of our scheme on an eight-node (2 3 ) cube 
and the extension of the scheme to a 64-node cube (2 6 
cube) configuration. With the help of the dual-port 
technique and the use of the network controllers, a 
message can transfer between any two nodes in a 2 3 
cube with a maximum of two memory transfers, even if 
the Hamming distance between the nodes is three. In 
the 64-node architecture, we achieve a message trans¬ 
fer between any two communicating nodes with a 
maximum of three memory transfer operations. A 
message can also be transmitted from a source node to 
a destination node through one of the several parallel 
■paths consisting of the PEs, network controllers, and 
the central controller, depending on their availability. 

T he dual-ported RAM approach for message trans¬ 
fer between nodes in a multiprocessor system 
offers cost and speed advantages. The extended 
hypercube is an example of a low-cost, compact multi¬ 
processor system with minimal software and hardware 
overheads. With an 8088 processor operating at 10 
MHz, we have achieved a data transfer rate of 0.588 
Mbytes per second (4.7 Mbits per second). H 
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SPECIAL FEATURE 


Implementation and 
Performance of the 
Fast Hartley Transform 


You can cut 
computation 
time by 

substituting this 
easy DSP tool in 
general 
applications 
that involve real 
signals. 


T he Fourier transform, an important tool in digital signal processing, 
allows signals to be treated in the complex frequency domain. In 
most applications, however, the signals in the time domain are real, 
causing the Fourier transform to contain redundancy. 1 

In 1942 R.V.L. Hartley 2 formulated a real integral transform, which later 
on R.N. Bracewell, 3 O. Buneman, 4 and others extensively investigated. The 
Hartley transform, because it works with real instead of complex numbers, 
reduces computation time and thus becomes a potential substitute for the 
Fourier transform in general applications. 

With this in mind, we investigate the implementation aspects of the fast 
Hartley transform, in both software and hardware. We describe the modifi¬ 
cations required to convert existing fast Fourier transform (FFT) programs 
to execute fast Hartley transforms (FHT), showing the ease with which 
these modifications can be implemented. We compare execution time and 
memory storage requirements of both transforms and present power spec¬ 
trum calculation and convolution as illustrative examples to compare the 
performance of the two transform techniques. We also survey the compara¬ 
tive performance of various microprocessors and digital signal processors 
in FFT and FHT computation. 


Tho Le-Ngoc 
Minh Tue Vo 
Condordia University 
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Overview 


Bracewell defines the Fourier transform of a signal f{t) as 5 
F(to) = f f(t)e-> a ' dt 

* — oo 




/(0( cos 


cor - j sin cot) dt 


(1) 


where t is the time-domain variable, and co is the frequency-domain vari¬ 
able. The inverse Fourier transform is given by 

f(t) = l/(2rc) j + F(co)e +Jm dco 


0272-1732/89/1000-0020$01.00 © 1989 IEEE 








When the input signal f(t) is real, the Fourier trans¬ 
form will be Hermitian; 1 that is, F(- ca) = F*(co), where 
the superscript * denotes complex conjugation. There¬ 
fore, since half the transform is redundant, the trans¬ 
form time can be reduced. 

To exploit this symmetry and redundancy in the 
Fourier transform of real signals, Hartley 2 introduced 
the following transform: 

r + °° 

//(co) = J f(t) cas co t dt 

( 2 ) 

/(f)(cos cof + sin cor) dt 

This transform is in some way a real version of the 
Fourier transform, with the complex exponential func¬ 
tion replaced by the cas function (cas x = cos x + sin a). 
Bracewell 13 gives us the inverse Hartley transform as 

r+ °o 

/(f) = 1/(2 jt) //( co)(cos cot + sin cof) d co 

■ — oo 

This is the same form as the forward Hartley transform 
in Equation 2, except for a scaling factor. 

The simple identity j = - ((1 -/)/(1 + j) } yields 

e -jw _ cos _ j s j n ^ 

= 1(1 + j) cos cof + (1 — j) sin cof }/(l +j) 

Using this relationship and Equations 1 and 2, we can 
show that 

F(co) = (//(co) + jH(- co) 1/(1 +j) 
which we can also rewrite to yield 
F(co) = [{//(©) + //(- co) 1/2] 

— j[{//(co) + //(- co)}/2] (4) 

= // £ (co) - jH 0 (a>) 

Hence, /?e(F(co)( = // £ (co), the even part of //(co), and 
Im {F(co)} = - H 0 ( co), the negative odd part of //(co). 
Since the Fourier transform can be recovered from the 
Hartley transform, no information is lost if we use the 
Hartley transform for a real input signal instead of the 
Fourier transform. 


for k = 0, ..., A - 1, and the inverse transform is 
N- 1 

x(n) = ( 1/AO yT F(k)e‘ 2nl: " IN 
k = o 

If the signal x(n) is real, its DFT is Hermitian; that is, 
F(N- k) = F *{k) for k = 0, ..., (N/2) - 1. We can exploit 
this property to halve the DFT computation time. 
However, since the DFT of a real signal is generally 
complex, and since this complex DFT is the input to the 
inverse DFT, the inverse transformation time cannot be 
reduced by the Hermitian property, even for a real 
signal. 

We can define the discrete Hartley transform (DHT) 
of a real signal x(n) as follows: 1 

N- 1 

H(k) = x(n) cas (2 nkn)/N (6) 

n = 0 

forZ = 0, ..., A- 1. We have replaced complex multi¬ 
plications in the DFT formula with real multiplications 
in the DHT. We reduce the computation time just as if 
we had used the Hermitian property of the DFT. 

According to Bracewell, 13 we achieve the inverse 
DHT by 

N- 1 

x(n) = (1/N) ^ H(k) cas ( 2nkn/N) (7) 

k = o 

Note that Equations 6 and 7 have exactly the same 
form, except for a scaling factor of 1/A. As a conse¬ 
quence, we can use any computation unit (software 
program or hardware device) capable of producing the 
DHT of a signal to calculate the inverse DHT as well. 
(We do this by including a multiplication by 1/A). The 
fact that we have now reduced the computation time 
(compared to the complex DFT) in both the forward and 
the inverse transformations further accentuates this 
advantage of the DHT over the DFT. Equations 3 and 4 
still hold if the continuous variable co is replaced by the 
discrete variable k: 

F{k)={H{k)+jH{-k))I(\ +j) (8) 


The discrete transforms. In digital signal pro¬ 
cessing we convert the signal into a sequence of dis¬ 
crete samples. If the sampling period is T, a time signal 
x(t) in discrete form becomes x{nT), where n is an 
integer. (Usually we drop the sampling period T if there 
is no confusion.) We can therefore redefine the Fourier 
and Hartley transforms in discrete form. 

We achieve the discrete Fourier transform (DFT) of 
a sampled signal x(n), n = 0, ..., A - 1, by 6 

N- 1 

F{k) = '^x(n)e~ i2nknlN (5) 

n - 0 


= H E (k) - jH a (k) (9) 

provided F(k) and H(k) are considered periodic with 
period A, so that //(- k) = H(N - k). 

The fast algorithms. If we use Equation 5 directly to 
calculate the DFT, we must perform A complex multi¬ 
plications for each of the A values of k. Hence, the 
computation time is proportional to A 2 . It is possible to 
reduce this time considerably; we call the collection of 
fast algorithms to compute the DFT the fast Fourier 
transform. 6 

Suppose that A, the number of samples, is a power of 
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2, say 2 p . We can rewrite Equation 5 as 


F(k) = F E (k) + e~ JMIN F 0 (k) 

( 10 ) 

F\k + (N/2)} = F E (k) - e-‘ MIN F 0 (k) 

( 11 ) 


for& = 0, ..., (N/2) - 1, where 

(M2) - 1 

F E (k) = ^ x(2n)e~ i2nK " (Nn) 

n = o 
(M2)- 1 

F 0 (ifc) = ^ x(2n + i)g-^*w ( w> 

n = 0 

We can see that F E (k) and F 0 (k) are the N/2-point 
DFTs of the even-numbered and the odd-numbered 
samples, respectively. The above relationship provides 
a way to break the original DFT into smaller trans¬ 
forms. We can repeat this recursive procedure until 
only 2-point transforms are left. We call the resulting 
algorithm a radix-2, decimation-in-time (DIT) FFT. 

The above algorithm breaks up the transform into a 
binary tree pattern and thus consists of log 2 N stages. In 
each stage, the number of operations needed to com¬ 
bine pairs of transforms is proportional to N. Hence, we 
can complete an N-point transform in 0(Nlog 2 N) time, 
which is considerably lower than 0(N 2 ). O denotes the 
order of {Nlog 2 N}. If the Hartley transform is to com¬ 
pete with the Fourier transform, an 0(Mog 2 N) algo¬ 
rithm must be developed to compute the DHT. 

If we substitute Equation 8 into Equations 10 and 11, 
we obtain (after canceling the common denominator 
1 +;): 


H(k) + jH(N - k) = 

[H E (k)+jH E {(N/2)-k}] (12) 

+ e-»* klN [H 0 (k) + jH 0 ((N/2) - k\] 


H{(N/2) + k] + jH{(N/2)-k) = 

[H E (k) + jH £ {(N/2) - k}] (13) 

- e-** k/N [H 0 (k) + jH o m2) - k}] 

for k = 0, ..., (N/2) - 1, where H E (k) and H 0 (k) are the 
N/2-point DHTs of the even-numbered and the odd- 
numbered samples, respectively. Note that we replace 
- k by (N - k) in an /V-point transform and by {(N/2) - k ) 
in an N/2-point transform, because H(k) is considered 
periodic. 

We can eliminate the complex expressions in Equa¬ 
tions 12 and 13 by equating the respective real and 
imaginary parts on both sides of each equation. The 
result is four new equations that we can combine to 
yield 

H(k) = H E (k) + [H 0 (k) cos (2 ti k/N) (14) 

+ H 0 {(N/2) -&} sin (2nk/N)] 



Figure 1. The butterfly structure. 


H((N/2) + k} = H E (k) - [ H 0 (k) cos (2 k k/N) 1 g 

+ H 0 {(N/2) - k) sin (2nk/N)] 

where k = 0, ..., (N/2) - 1. Equations 14 and 15 show 
that, like the DFT, the DHT can be recursively broken 
up into smaller transforms, and therefore a fast algo¬ 
rithm can be devised to compute it. By analogy, we call 
the fast algorithm a fast Hartley transform. 1 


FHT implementation 

We can use the recursive formulae just discussed to 
implement an FHT program. However, the similarities 
between the FFT and the FHT, as well as the existence 
of numerous good FFT programs, sometimes make it 
simpler and more efficient to modify an existing FFT 
program to perform FHT. The following presents a 
method described by Buneman 4 to convert from FFT 
to FHT. This method works for any radix-2, DIT imple¬ 
mentation. 

We usually implement the radix-2, DIT FFT algo¬ 
rithm in the following way. We store the N input data 
values in an array. The program goes through p passes 
(N = 2 p ), operating on the array elements in each pass; 
at the end of the last pass, the array contains the DFT 
results. To have the output values come out in the 
proper order, we must perform a process called bit- 
reversal permutation, either at the beginning or at the 
end of the algorithm. However, since this data permu¬ 
tation applies to the FFT and FHT in exactly the same 
way, we did not consider this in our conversion 
discussion. 

In each of the p passes, the FFT program operates on 
pairs of data points in the manner dictated by Equations 
10 and 11. We can summarize this operation in the 
structure called a butterfly (Figure 1). The butterfly 
produces two output array elements Y(n) and Y(n + k) 
from two input array elements X(n) and X(n + k). In the 
butterfly, the X's and T’s are complex (in an in-place 
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Figure 2. The FHT butterfly. 


Figure 3. Interpreting the negative indices. 


implementation, X and Y designate the same array). 
Such butterflies are collected together in groups of k 
consecutive butterflies. (If the passes are numbered 0 to 
p — 1, then k = 2' in pass i). The first butterfly group in 
each pass contains butterflies corresponding to n - 0, 
..., k-l. 

The butterfly in Figure 1 means the following: 

ReY(n) <— ReX(n) 

+ {ReX(n + k) cos 0 (16) 

+ ImX(n + k) sin 0} 

ImY(n) <— ImX(n) 

+ [ImX(n + k) cos 0 (17) 

- ReX(n + k) sin 0) 

ReY(n + k) <- ReX(n) 

~{ReX(n + k) cos 0 (18) 

+ ImX(n + k ) sin 0} 

/wT(n + k)<- ImX(n) 

- { lmX(n + k) cos 0 (19) 

- ReX(n + k) sin 0) 

where 0 is the phase angle of the butterfly {0 = (2nk/N) 
for some k}. 

Substituting Equation 8 into the above yields the 
FHT butterfly (with m = n + k) in Figure 2. The FHT 
butterfly produces real outputs W(n ), W(- n), W(m), 
W{- m) from the V inputs. This butterfly represents the 


following equations: 



W(n) «- 

V(n) + 

| V(m ) cos 0 + V(- m) sin 0) 

(20) 

IT(— n) <— 

V(- n) + 

( V{- m) cos 0 - V(m) sin 0} 

(21) 

W{m) <r- 

V(n)- 

[V(m) cos 0 + V(- m) sin 0} 

(22) 

W{- m) <- 

V(- n) - 

{ V(- m) cos 0 - V(m) sin 0} 

(23) 


The similarities between the two sets of Equations 16 


through 19 and 20 through 23 suggest the following 
relabeling of the array elements: 


Re X(n ) = 

V(n ) 

Im X(n) = 

F(- n) 

Re X(m) = 

V(m ) 

Im X(m) = 

V(- m) 

Re Y(n ) = 

W(n) 

Im Y(n) = 

W(- n) 

Re Y(m) = 

W(m) 

Im Y(m) = 

W(- m) 

With this relabeling scheme, Equations 16-19 and 


20-23 become identical. Therefore, we can easily 
modify the code to compute an FFT butterfly. We can 
compute an FHT butterfly by reindexing the array 
elements used in the butterfly to implement the relabe¬ 
ling just described. 

To resolve the negative indices, remember that the 
butterfly has width k = m - n. Hence, the signal values 
contained in V and W are considered periodic, with 
period k on the left side (V) and 2k on the right side ( W ). 
We interpret the negative indices according to the 
diagram in Figure 3. For instance, if n < k (that is, the 
butterfly is in the first group of a pass), - n = k - n 
and -m = 2k- n on the left side, while - n = 2k- n and 

- m = k - n on the right side. 

An exception must be noted in the first butterfly of 
each group, where the phase angle 0 is zero. Since 

- 0 = 0, we interpret the negative indices as shown in 
Figure 4 on the next page. 

When the phase angle is 0 or n/2, the indices - n and 
n point to the same array element. Therefore, we must 
carry out only half the computations, either for the “real 
part” or for the “imaginary part.” 

This indexing scheme preserves the equations of the 
butterfly, but resolves the negative indices differently 
on the input side (left) and the output side (right). In an 
in-place implementation, V and W represent the same 
array. Thus it is more advantageous to relabel the V’s 
and the W’s in a uniform manner. For instance, in the 
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Figure 4. Interpreting the negative indices for the 
first butterfly of each group. 


first group of a pass, we should have - n = k - n and 
- m = 2k- non both sides of the butterfly. In this new 
scheme, we exchange the W values in Equations 21 and 
23. Hence, one addition and one subtraction in those 
equations trade places, so that these equations no longer 
correspond to Equations 17 and 19 but to the following: 

ImY(n) <— ImX(n) - ( ImX(n + k) cos 0 (17') 

- ReX(n + k) sin 0} 

ImY(n + k) <— lmX{n) + { lmX(n + k) cos 0 

- ReX(n + k) sin 0) 

Either reindexing scheme can be used in an FFT-to- 
FHT conversion. 

Although the FFT and FHT butterflies look very 
much alike, the number of butterflies to be calculated is 
not the same for both algorithms. In an FFT pass, each 
k butterfly in each group must be evaluated. On the 
other hand, the first { (k/2) + 1} FHT butterflies in each 
group cover the remaining array elements in the group 
as well, through the terms with negative indices. For 
instance, in the first group of a pass, when n runs from 
0 to 7/2, the values {(k/2) + 1}, ..., ( k - 1) will be taken 
care of by k - n (which represents - n). Thus, we must 
compute only butterflies with phase angles in the inter¬ 
val (0, (n/2)) in an FHT program. 

Execution speed. In a group of k butterflies, we 
compute only (k/2) + 1 FHT butterflies. Compare this to 
the previously discussed k butterflies in the FFT case. 3 
Moreover, two of these butterflies correspond to phase 
angles of 0 and k/2, and contain only half as many 
operations as do their FFT counterparts. Therefore, we 
can compute a group of FHT butterflies twice as fast as 
a group of FFT butterflies. It follows that the FHT 
execution time is only half the FFT execution time. 

An FFT program written in assembly language for 
the IBM PC and converted to an FHT program in the 
manner described earlier yielded timing results that 
confirmed the theoretical prediction. A 1,024-point 
FFT executed in 71.23 milliseconds on a 10-MHz IBM 
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PC AT, whereas a 1,024-point FHT took 35.78 ms on 
the same machine for an FFT-to-FHT ratio of 1.99. 

It should be remembered that an FHT is twice as fast 
as a complex-input FFT, but no faster than a real-input 
FFT based on the Hermitian property. The speed advan¬ 
tage of the FHT is only apparent in the inverse transfor¬ 
mation, in which we must use a complex FFT in any 
case. 

Memory storage. The Hartley transform yields real 
outputs, while the Fourier transform yields complex 
outputs. Hence, an FHT program requires only half the 
array space of an FFT program, for the same size of 
transform. Also, the FHT algorithm only needs cosine 
and sine values for angles from 0 to (k/2), while the FFT 
needs angles up to 7t. This represents an extra saving in 
memory space for programs that keep cosine and sine 
values in tables. 

Usefulness in applications. The two most important 
and widely used applications of the FFT in digital 
signal processing are power spectrum calculation and 
convolution. We can perform these computations with 
the FHT as easily as with the FFT. 

The power spectrum of a signal having Fourier 
transform F is given by 

P = ( ReF ) 2 + (ImF) 2 

Substituting Equation 9 into this yields 

P(k) = (H(k) 2 + H(- k) 2 )/2 

where H(k) is the Hartley transform of the signal. Thus, 
we obtain the power spectrum directly from the FHT 
outputs and with the same speed as when the FFT is 
used. 

Two power spectrum programs, one using the FFT 
and the other using the FHT for 1,024 data points, 
yielded the following timing results on a 10-MHz IBM 
PC AT. The time spent on calculating the power spec¬ 
trum from the transformed data was 31,768 cycles for 
the FFT and 34,830 cycles for the FHT. Since a 1,024- 
point FHT executes on the machine in 35.8 ms (as 
reported earlier), and a real-input FFT has the same 
speed, it follows that the total computation time was 
approximately 39 ms for the FFT and 39.3 ms for the 
FHT, about the same for both transforms. 

Consider the simple linear system displayed in Fig¬ 
ure 5. The output signal g(f) is given by 

g(t) =f(t) * h(t) 

= F-MG(co)) 

where 

G(co) = //(co)F(co); 

F(t o) = F 1/0)1; 

//(CO) = F{h(t)\; 














F and F~ ] denote the Fourier transform and its inverse, 
respectively, and * denotes convolution. The deriva¬ 
tion of g(t) requires two Fourier transforms/(f) and h(t) 
and one inverse Fourier transform G(co). 

If two signals x t (n) and x 2 (n) are convolved to yield 
x } (n) = x ,(«) * x 2 (n), the Fourier transform of x 3 (n) 
becomes F } (k) - F x (k)F 2 {k). Here, F { (k) and F 2 (k) are 
the Fourier transforms of x x {n) and x 2 (n), respectively. 
Replacing the real and imaginary parts of the Fourier 
transforms with the even and odd parts of the Hartley 
transforms as in Equation 9, we can show that 1 

H/k) = 1/2 \H x {k)H 2 (k) - //,(- k)H 2 {- k) 

+ H t (k)H 2 (-k) + H ] (-k)H 2 (k)} ( 24 ) 

= HjwUk) + //,(- k)H 20 (k) 

= H 2 (k)HJk) + H 2 (- k)HJk) ( 25 ) 

where the H's are the Hartley transforms of the corre¬ 
sponding signals, and the subscripts E and O denote the 
;ven and odd parts, respectively. Although the convo¬ 
lution formula for the Hartley transform looks more 
complicated than the one for the Fourier transform, it 
consists of the same number of real multiplications as 
does the complex multiplication in the other formula, 
and only one extra real addition. Moreover, in many 
applications, the signals are usually even or odd. In 
these cases, one half of Equation 24 or Equation 25 
vanishes, making it possible to perform convolution 
with one real multiplication of the Hartley transforms. 1 - 3 

Convolution provides a good means to compare the 
FFT and the FHT in applications, since it involves both 
the forward and the inverse transformations. As an 
example, consider a discrete version of the system 
shown in Figure 5, with two forward and one inverse 
transformations. Assume that the computation time for 
a general DFT is T dft , and that a multiplication in 
frequency domain is T MVL - The computation time for the 
output g(n) is thus 3 T dft + T MUL , in general. If the input 
signal is real and the Hermitian property is used to 
reduce the forward DFT time to 0.5 T dft , we can 
compute g(/?) in 2 T dft + T mL . Usually T UUL « T DFr 
Hence, the Hermitian property allows us to reduce the 
convolution computation time by a 2/3 factor. 

If we use the DHT in the previous example, we can 
reduce the computation to 0.5 T DFT in both the forward 
and inverse transformations, resulting in a total compu¬ 
tation time of \.5T dft + J MUL . This time is approxi¬ 
mately one half of that for the general (complex-input) 
DFT and three quarters of that for the real-input DFT. 

Two convolution programs, one using the FFT and 
the other using the FHT for 1,024 data points, yielded 
the following timing results on a 10-MHz IBM PC AT. 
Multiplying the transforms of the input signals took 
82,577 cycles for the FFT version and 70,083 cycles for 
the other. The FHT computations (forward and in- 


Hty 



9 (0 


Figure 5. A simple linear system. 

verse), as well as the forward FFT computations, took 
about 35.8 ms; the inverse FFT time was about 71.2 ms. 
Since there are two forward and one inverse transfor¬ 
mations, the total execution time was about 114.4 ms 
for FHT and 151.1 ms for FFT. This gave us an FHT-to- 
FFT ratio of 0.76, which is very close to the estimate of 
0.75 given earlier. 

It seems disappointing that the FHT does not reduce 
the execution time by half. However, remember that 
both the forward and the inverse FHT call for the same 
routine. In addition, the FFT must use two separate 
routines for the forward and inverse transforms if real 
input signals are to be treated differently, rendering the 
program more complicated. On the other hand, an FFT 
program with the capability to switch automatically 
between real-input and complex-input versions would 
be more general than an FHT program, since it would be 
able to handle any kind of signals. 

Hardware implementation. We based a paper de¬ 
sign on an existing FFT processor to see how easy it 
would be to switch from the FFT to the FHT in hard¬ 
ware. We arbitrarily chose the 6-cycle butterfly FFT 
processor presented by R.J. Karwoski 7 for this job. It 
turned out that hardware components and structures 
designed for FFT can be used to implement FHT with 
only minor modifications. 

A cycle count reveals that the FHT processor would 
take exactly half as many clock cycles as its FFT 
ancestor to complete a transformation. The FFT pro¬ 
cessor computes each of the (/V/2)log,/V butterflies in 
six cycles. The FFT took 3/Vlog,/V cycles, and the FHT 
thus took 1.5/Vlog 2 /V cycles. Since we used a clock 
period of 100 nanoseconds, a 1,024-point FFT took 
3.07 ms and a 1,024-point FHT took 1.54 ms. 

Comparative performance on various systems. 

Table 1 on the next page lists a survey of the execution 
times of many FFT and FHT software and hardware im¬ 
plementations, to give an idea of the performance 
attainable with the present technology. For the FFT we 
include some timing values that are actual measure¬ 
ments performed at Concordia University and some 
from benchmarks obtained from other sources. Only 
FHT values for the Intel 86 microprocessor family 
reflect actual measurements. We took the others to be 
one half of the correspondent FFT values, since 
Bracewell 1 - 3 has proven this in theory, and we have 
shown it in practice here to be the case. 
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Table 1. 


The 1,024-point FFT and FHT execution times. 



Clock 

Complex FFT 

FHT time 

Manufacturer 

Processor 

(MHz) 

time (ms) 

(ms) 

Software 





Intel Corp. 

8086* 

8 

292.8 

144.6 


80286* 

10 

71.2 

35.8 


80386* 

16 

43.1 

21.4 

Texas Instruments 

TMS32020** 

5 

31.8 

15.9 


TMS320C25t 

10 

13.2 

6.6 


TMS320C30j: 

16.7 

3.75 

1.88 

Analog Devices 

ADSP-2100t 

8 

7.6 

3.8 

National Semiconductor 

LM32900+ 

10 

13.4 

6.7 

NEC 

PD77230t 

6.7 

10.8 

5.4 

Phillips/Signetics 

DSPlf 

8 

6.85 

3.43 

Motorola 

DSP56000§ 

10 

4.99 

2.5 

Hardware 

Design based on Karwoski 7 

10 

3.07 

1.54 


Sequential 100-ns butterflies^ 


0.512 

0.256 


Cascaded 100-ns butterflies^ 


0.0512 

0.0256 


Parallel 100-ns butterflies^ 


0.001 

0.0005 


Array 100-ns butterflies^ 


0.0001 

0.00005 


"“Obtained from programs developed at Concordia University 
**Based on results in Papamichalis and So . 8 
tBased on results in Marrin . 9 
fBased on results in Texas Instruments . 10 
§Based on results in Morris." 

(|Based on results in Mavor and Grant." 


A s just shown in the case of power spectrum calcu¬ 
lation and convolution, the Hartley transform 
becomes an adequate substitute for the popular 
Fourier transform in general applications involving 
real signals. If the Fourier transform is explicitly re¬ 
quired, we can always obtain it from the Hartley trans¬ 
form with Equation 8 or Equation 9. But usually we can 
use the Hartley transform directly. 

The FHT is very easy to implement based on existing 
FFT software or hardware components. It can also 
benefit from improved techniques intended for the 
FFT, such as new algorithms to calculate bit-reverse 
indices 13 or faster hardware multipliers. 

The FHT, generally twice as fast as the complex- 
input FFT, only has the same speed as the real-input 
FFT. However, even if the original input time signal is 
real, the inverse FFT still must deal with complex-input 
values. Thus, the FHT indeed proves faster than the 
FFT when both the forward and the inverse transforma- 
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tions are required—such as in convolution—although 
not quite twice as fast. An FFT program that can take 
advantage of real input signals and is also capable of 
handling complex inputs, while perhaps more general 
than an FHT program, will certainly be larger and much 
more complicated, isi 
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imply put, a benchmark is a standard forjudging relative perform- 
ance arnon 8 various computers. Unfortunately, any simplicity ends 
^1^ with this definition. The first problem arises from the fact that users 
have no official standard to follow when they want to evaluate a benchmark. 
The second problem revolves around the truism that the best benchmark in 
the world that measures someone else’s application does just that. From the 
user’s perspective, the best benchmark in the world accurately measures 
system performance in a target application. The task of creating a good 
benchmark includes the determination of which tests are applicable to the 
user’s environment and how to determine the results. 

Here I generally describe benchmarking and discuss some of the specific 
benchmark tests for computer systems that are in use today. I also examine 
some of the pitfalls involved with benchmark comparison and analysis and 
point out how to avoid them—or at least to minimize the impact of such 
problems. The goal is to learn how to gather and interpret meaningful 
comparison data. 


Benchmarking caveats 

Producing and interpreting benchmark data crosses into the realm be¬ 
tween art and science. No single-task benchmark test (one that only meas¬ 
ures one aspect of a computer’s performance) can fully characterize the true 
performance capability of a system under actual user loads. Safety does 
reside in numbers. A collection of different benchmark tests merely pro¬ 
vides some averages. But users must take care when comparing the results. 

Features like optimizing options on today’s compilers can drastically 
affect the results of benchmark tests. Some vendors publish a range of test 
results using different optimization levels. This documentation probably 
provides the most reliable form of public-domain benchmark data. Combin¬ 
ing this data with the appropriate system configurations that were actually 
used in the tests provides the background information needed to reproduce 
the data, if necessary. 

0272-1732/89/1000-0028$ 1.00 © 1989 IEEE 








Users should also consider the source or “pedigree” 
of a benchmark. Benchmark code tends to migrate quite 
freely, given public-domain networks like Usenet. 
Modifications made to this code, for whatever reason, 
cause the evolution of many different versions. Hand- 
coded libraries or conversions from one language to 
another can cause successive generations of the test. 
Results vary according to which version of a bench¬ 
mark test is used. Because of this process, some bench¬ 
marks have degenerated to the point of being virtually 
useless. 

Users should avoid making a determination of over¬ 
all system performance based on one benchmark test. 
Many benchmarks are one-dimensional in nature (that 
is, they test only one aspect of a system). Some popular 
tests stress only raw processor-instruction bandwidth 
or floating-point performance. Other items affecting 
system performance include 

• file system and compiler efficiencies, 

• cache-memory size and organization, 

• main-memory size and organization, 

• I/O and file transfer speeds, 

• user loads, and 

• application mixes. 

Most benchmark tests do not verify the validity of the 
performance results they produce. Without verified test 
results, users can find it difficult to determine whether 
the benchmark was properly performed. Some optimiz¬ 
ing compilers recognize small benchmark suites and 
simply return the expected result without performing 
any computation. Totally meaningless performance 
data results. 

The only way to truly compare benchmark test results 
is to port and compile the same copy of the operating- 
system source to all of the systems under consideration. 
This procedure minimizes the effects from differences 
in binary copies of the code and ensures a common base 
of software for these systems. Since this task is beyond 
the reach of most organizations, users can employ 
another option. Setting up the test carefully and obtain¬ 
ing a clear understanding of how to interpret the results 
can yield a successful comparison. The goal is to 
establish a level playing field for all of the systems 
being tested. 

Some additional questions to ask when analyzing or 
generating benchmark data follow. 


1) Which real-world application does the bench¬ 
mark measure, and how does the user accurately 
characterize the system’s performance under typi¬ 
cal application work loads? 

Try to choose a set of benchmark tests that analyzes 
the components critical to producing optimal user per¬ 
formance. For example, a CPU-intensive benchmark 
by itself does not provide sufficient data to interpret 
how a system will perform in an I/O-intensive environ¬ 


ment like transaction processing. In this example, the 
benchmark suite should also measure disk-I/O and file¬ 
system efficiencies and work-load capacity (keystroke¬ 
handling capability). 

2) What factors influenced the generation of the 
benchmark data? 

Correlating data from different or biased sources 
probably will not produce an accurate comparison. A 
vendor may provide data from a “hot box” (or nonpro¬ 
duction hardware) or use a compiler that “recognizes” 
a benchmark suite and loads a hand-optimized algo¬ 
rithm for the test. Valid data should contain documen¬ 
tation of test configurations and optimization levels. 
The tests should be repeatable using a production-grade 
system. 

The benchmark test-result data that appears in this 
article (shown in tables) constitutes a general guideline 
only. Users should implement application-specific tests 
(that is, actual application software running in a typical 
user environment) to supplement this information. 

3) Were devices like cache accelerators or opti¬ 
mizing compilers used in all of the configurations 
tested? 

Cache accelerators can make a small benchmark run 
much faster than if the same code were to run in another 
memory location. Optimizing compilers can virtually 
reduce a repetitive benchmark to a simple NOP (no 
operation) and yield totally meaningless results. 

4) Was one system tuned for a particular bench¬ 
mark? 

Some vendors have a reputation for publishing 
benchmark data obtained from a hot box or a modified 
system that a customer could not buy or duplicate with 
off-the-shelf components. Simple operating-system 
kernel tuning can improve benchmark results by as 
much as 50 percent. Hand-optimized utilities can im¬ 
prove benchmark performance by as much as 30 per¬ 
cent. All operating-system parameters and compiler 
options should match closely from one system to an¬ 
other. 

5) What sources should users employ for bench¬ 
mark data? 

To assure the best comparison of results, users should 
investigate several different sources. When possible, 
they should use the results from the same test suite on 
all systems being evaluated. Users can also employ 
independent third-party benchmark data as a verified, 
unbiased source of information. 

In summary, users should 

• determine which aspects of system or component 
performance are to be measured, 

• determine the best source of benchmark suites or 
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performance data (either public-domain or licensed 
third-party packages), 

• ensure that all system-hardware and operating- 
system parameters during benchmark comparisons 
equate as closely as possible, and 

• understand what specific benchmark tests measure 
and what causes the results to vary. 


Benchmark tests 

Here I present a set of descriptions for many of the 
more popular benchmarks. I obtained the actual bench¬ 
mark data contained in the following tables from a 
variety of public-domain sources. I do not intend these 
benchmark figures to provide definitive results, but 
show them to give a general indication of relative 
performance among various computers. In many in¬ 
stances, a published range of performance values exists 
for various computers. Since space cannot reasonably 
present the total data, I list only the highest and lowest 
benchmark results for a system in terms of its particular 
measurement. 

I show the system model number and the general test 
configuration (processor type, the typical amount of 
cache and random-access memory available, the pro¬ 
cessor speed in megahertz, and the processor rating in 
millions of instructions per second) for each entry. (See 
the accompanying box for a discussion of MIPS.) This 
data provides a basic understanding of the test setup. As 
stated, obtaining a description of the entire hardware 
and software test setup—along with the actual re¬ 
sults—adds meaning to model-by-model comparisons. 

Finally, some table entries contain additional infor¬ 
mation in the comments column that provides back¬ 
ground data on how test results were obtained. Ex¬ 
amples include the operating system (such as VMS for 
the VAX and AIX for the IBM computer), the compiler 
options (optimized, unoptimized), and any special 
hardware (68882 floating-point units, or FPUs, and 
floating-point accelerators, or FPAs) used in the test. 
Dashes indicate the information was not published with 
the test results, or is generally unavailable. 

The industry uses a number of public-domain or 
licensed third-party benchmarks, some of which I dis¬ 
cuss in the following sections. (For an outline of some 
popular proprietary benchmarks, see the accompany¬ 
ing box.) 

Dhrystone. This synthetic (nonreal-world) bench¬ 
mark measures processor and compiler efficiency by 
executing a “typical” set of integer calculations. These 
calculations include integer arithmetic, character/ 
string/array manipulation, and pointers. Reinhold P. 
Weicker constructed the benchmark by using measured 
statistical data from actual user programs. The proce¬ 
dure does not use operating-system calls, I/O functions, 
or floating-point operations. The results provide a 


MIPS: A meaningful 
measurement? 

Millions of instructions per second, or MIPS, is 
a popular—-though superficial—way to describe 
computer system performance. MIPS typically 
come from either measured data or calculated 
maximum performance. We usually compare 
system performance ratings in MIPS with that of 
a VAX 11/780, which is considered to be a 1- 
MIPS machine. On a family of systems with a 
common processor (like the Motorola 88000), 
MIPS ratings can help judge relative system inte¬ 
ger performance. The ultimate apples-to-oranges 
phenomenon occurs when we compare MIPS 
ratings between two different architectures like 
RISCs and CISCs (reduced and complex instruc¬ 
tion-set computers). The key points to remember 
about MIPS are 

• instructions do not remain constant from proc¬ 
essor to processor, and 

• MIPS are only meaningful in the context of a 
single processor family. 

To make sense out of MIPS ratings, users must 
first define the term instruction. A direct correla¬ 
tion does not always exist between the number of 
instructions being executed and the amount of 
actual work being accomplished. On a jovial 
note, the industry has come up with many colorful 
ways to use the MIPS acronym: 

• meaningless indicators of performance for 
systems, 

• meaningless information of performance for 
salesmen, or 

• meaningless information from pushy sales¬ 
men. 


general measure of user-level integer performance. 
This benchmark contains little code that can be opti¬ 
mized by vector processor systems. 2 

Weicker wrote the original Dhrystone benchmark in 
the Ada programming language. Rick Richardson later 
rewrote it in the C language and posted it on the Usenet 
network. Results appear in Dhrystones per second. 
Users should exercise caution with the Dhrystone test 
(as with any benchmark test) because it does not always 
reflect how large user applications perform. Also, opti¬ 
mizing compilers can remove useless code from Ver¬ 
sion 1.1 of the benchmark, which improves the per- 
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Proprietary benchmarks 


The following performance benchmarks—in con¬ 
trast to the others in this article—have not entered 
the public domain. Users who wish to obtain more 
information should contact the companies directly. 

Aim benchmarks. Aim Technology in Palo Alto, 
California, sells and maintains two suites of multi¬ 
user benchmark tests. 

Suite III. Written in the C programming language, 
this suite simulates applications that fall into either 
task- or device-specific categories. 

The task-specific routines simulate such func¬ 
tions as word processing, database management, 
and accounting. The device-specific code measures 
the performance of hardware features like memory, 
disk, floating-point, and I/O operations. All meas¬ 
urements represent a percentage of VAX 11/780 
performance. 

The performance and user ratings constitute the 
two most frequently quoted results. The perform¬ 
ance rating represents a percentage of VAX 11/780 
performance in which the VAX equals 100 percent 
(for example, the Motorola SYS3640 supermicro¬ 
computer has a performance rating that is 400 per¬ 
cent of the VAX, or four times the performance). 

The second parameter, or user rating, is the maxi¬ 
mum number of concurrent users that a system can 
support. For example, a VAX 11/780 equals 12 
users. In general, the Aim III suite gives a better 
overall indication of a system’s performance than 
small, single-task benchmarks. The company veri¬ 
fies and maintains all official results. Published, 


copyrighted reports are available on individual 
computer systems. 

Suite V. This benchmark suite measures through¬ 
put in a multitasking workstation environment. The 
design goals of this new suite include 

• the ability to stress single-user, multitasking sys¬ 
tem performance, 

• simulations of real-world systems that employ 
routines based on actual user applications, 

• incremental system loading to gradually in¬ 
crease the stress on system resources, and 

• testing multiple aspects of system performance. 1 

The graphically displayed results plot the work¬ 
load level versus the amount of time (in seconds) to 
process the specific load level. Several different 
models characterize various user environments such 
as financial, publishing, and software development. 

Business Benchmark. Developed and main¬ 
tained by Neal Nelson and Associates in Chicago, 
this collection of 18 separate routines examines 
various aspects of system performance. The bulk of 
these routines consists of a series of loops that 
exercise functions such as disk I/O speed, floating¬ 
point performance, and processor and cache effi¬ 
ciency. Various combinations of the 18 routines 
characterize different real-world business applica¬ 
tions like word processing, accounting, and applica¬ 
tion development. Unlike many other benchmark 
tests in use today, Business Benchmark results indi¬ 
cate that CISCs tend to outperform RISCs on some 
multiuser tasks. 


formance results by as much as a factor of two. 
Dhrystone Versions 2.X eliminated this dead code to 
thwart the efforts of current optimizing compilers. 
Table 1 on the next page reflects Version 1.1. 

Originally, Dhrystone solely used features like peep¬ 
hole optimizers, but today anything appears to be ac¬ 
ceptable—short of in-line coding. When benchmark 
numbers are quoted, users should ask which compiler 
options were selected at execution time. 

The Dhrystone routine itself contains some peculiar 
attributes that have been coded into the program. Part of 
the routine involves copying long, 30-character strings 


that happen to be on unaligned word boundaries. 
Dhrystone uses zero-offset address for 50 percent of its 
memory data references, where a more real-world 
number is somewhere between 10 and 15 percent. 2 
These attributes tend to make some machines appear 
faster than others. The Dhrystone test seems to be 
lenient on processors like the Am29000 because the 
routine does not exercise CPU areas that would slow 
the 29000 down under typical user-application loads. 3 
The Dhrystone benchmark is also small enough to fit 
into the instruction cache of some systems, which 
further skews the results. 
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Table 1. 

Summary of Dhrystone 1.1 benchmark test results. 


System/model 

Processor 

Cache 

(Kbytes) 

RAM 

(Mbytes) 

Frequency 

(MHz) 

MIPS 

rating 

Dhrystones/s 
Low High 

Comments 

Alliant FX/8; MP=8 

Proprietary 




35.0 

7,655 

7,655 


ALR FlexCache 25386 

80386 

64 

5 

25.00 

— 

8,671 

8,671 

SCO Unix 

Altos Series 2000 

80386 

— 

— 

16.00 

3.0 

4,237 

4,348 


Amdahl 5860 

— 

— 

— 

— 

— 

28,846 

28.846 

C compiler, V. 1.22 

Amdahl 5890/300E 

— 

— 

— 

— 

— 

43,668 

43,668 

C compiler 

Apollo 5X0T 

68020 

— 

— 

20.00 

3.4 

6,250 

6,250 


Apollo Series 10000 

Prop. RISC 

— 

— 

18.20 

16.0 

25,461 

27,000 

Up to four CPUs 

Apollo Series DN3000 

68020 

— 

— 

12.50 

1.2 

2,186 

2,186 


Apollo Series DN4000 

68020 

0 

4 

25.00 

4.0 

6,038 

7,109 


Apple Macintosh 

68000 

— 

— 

7.70 

— 

625 

625 


Apple Macintosh II 

68020 

— 

— 

15.70 

— 

2,106 

2,719 

H = Greenhills C compiler 

V. 1.8 

Apple Macintosh Plus 

68000 

— 

— 

7.83 

— 

660 

769 


AT&T 3B1 

68010 

— 

— 

10.00 

— 

973 

1,033 

Unix V. 1 

AT&T 3 B 2/300 

32000 

— 

— 

7.20 

— 

409 

699 

L = Unix V. 2; H = Unix V. 3 

AT&T 3B2/400 

32100 

6 

4 

10.00 

1.0 

672 

1,120 

L = Unix V. 3; H = Unix V. 2 

CCI Power 5/32 

68010 

— 

— 

12.50 

— 

1,129 

1,192 

Unix 4.2 BSD 

CCI Power 6/32 

Proprietary 

— 

— 

— 

— 

8,498 

8.498 


CCI Power 7/64 

Proprietary 

— 

— 

— 

— 

53,108 

53,108 


Compaq 386 

80386 

0 

4 

16.00 

— 

1,724 

2,941 


Compaq 386/20 

80386 

0 

4 

20.00 

— 

7,575 

9,335 


Compaq 386/25 

80386 

32 

5 

25.00 

— 

8,277 

10,617 


Convergent Server PC 

80386 

64 

4 

20.00 

5.7 

6,534 

9,436 

L = unopt.; H = opt. 

Convex C-l XP 6.0 

Proprietary 

— 

— 

— 

4.5 

7,249 

7,249 


Cray IS 

Proprietary 

— 

— 

— 

— 

14,820 

14,820 


Cray X-MP 

Proprietary 

— 

— 

— 

— 

18,530 

18,530 


Data Gen. MV 15000-20 

Proprietary 

16 

64 

11.80 

8.0 

8,300 

8,300 


Data Gen. MV20000 

Proprietary 

16 

64 

— 

10.0 

8,300 

8,300 


DEC pVAX 3500 

Proprietary 

— 

— 

11.10 

3.5 

4,746 

4,746 


DEC pVAX II 

Proprietary 

— 

— 

5.00 

0.9 

1,326 

1,612 

L = VMS 

DEC VAX 11/750 

Proprietary 

— 

— 

— 

1.0 

835 

961 


DEC VAX 11/780 

Proprietary 

— 

— 

5.00 

1.0 

1,243 

1,870 


DEC VAX 11/784 

Proprietary 

— 

— 

— 

— 

5,263 

5,555 


DEC VAX 11/785 

Proprietary 

— 

— 

7.50 

1.5 

1,783 

2,069 


DEC VAX 8550 

Proprietary 

— 

— 

— 

6.4 

8,000 

10,416 


DEC VAX 8600 

Proprietary 

— 

— 

— 

4.5 

6,423 

10,416 

L = Unix 4.3 BSD 

DEC VAX 8600 

Proprietary 

— 

— 

— 

4.5 

4,896 

5,235 

Ultrix V. 1.2 

DEC VAX 8650 

Proprietary 

— 

— 

— 

6.2 

7,123 

10,787 

H = VMS 

DEC VAX 8700 

Proprietary 

— 

— 

— 

6.0 

10,416 

10,416 


DEC VAX 8810 

Proprietary 

— 

— 

22.22 

12.0 

10,416 

10,416 


DEC Vaxstation 2000 

Proprietary 

— 

— 

5.00 

0.9 

1,502 

1.502 


DEC Vaxstation 3200 

Proprietary 

— 

— 

5.00 

2.0 

5,271 

5,271 


Decstation 3100 

R2000 

128 

8 

16.70 

14.0 

16,870 

26,600 

L = unopt.; H = opt. 

Force CPU-2 IB 

68020 

— 

— 

25.00 

— 

5,555 

5,555 


Force CPU-386A 

80386 

— 

— 

16.00 

— 

6,200 

6,200 


HP 9000 Mod. 320 

68020 

— 

— 

16.70 

2.0 

2,464 

2,671 

HP/UX V. 5.02 

HP 9000 Mod. 340 

68030 

— 

— 

16.70 

— 

6,536 

6,536 


HP 9000 Mod. 360 

68030 

0 

4-12 

25.00 

4.5 

6,702 

6,702 


HP 9000 Mod. 500 

Proprietary 

— 

— 

— 

— 

1,599 

1,599 

HP/UX V. 5.05; 1 processor 

HP 9000 Mod. 500 

Proprietary 

— 

— 

— 

— 

3,020 

3,020 

HP/UX V. 5.05; 2 processors 

HP 9000 Mod. 500 

Proprietary 

— 

— 

— 

— 

4,140 

4,140 

HP/UX V. 5.05; 3 processors 

HP 9000 Mod. 550 

Proprietary 

— 

— 

— 

— 

1,518 

1,531 

HP/UX V. 5.11 

HP 9000 Mod. 825S 

Prop. RISC 

16 

— 

12.50 

3.0 

17,829 

17,829 


HP 9000 Mod. 825SRX 

Prop. RISC 

16 

— 

— 

8.0 

13,157 

16,672 


HP 9000 Mod. 835S 

Prop. RISC 

128 

— 

15.00 

4.0 

23,430 

23,441 


HP 9000 Mod. 835SRX 

Prop. RISC 

128 

— 

15.00 

4.0 

23,430 

23,430 


HP 9000 Mod. 840 

Prop. RISC 

128 

24 

8.00 

4.5 

11,165 

11,215 

H = opt. 
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Table 1 (continued). 



Cache 

RAM 

Frequency 

MIPS 

Dhrystones/s 


System/model 

Processor 

(Kbytes) 

(Mbytes) 

(MHz) 

rating 

Low 

High 

Comments 

HP 9000 Mod. 840S 

Prop. RISC 

128 

24 

8.00 

4.5 

9.920 

9,920 


HP 9000 Mod. 850S 

Prop. RISC 

— 

— 

13.70 

7.0 

15,576 

21,358 


IBM 3081 

— 

— 

— 

— 

— 

15,007 

15,007 

C compiler V. 1,5 

IBM 3090/200 

— 

— 

— 

— 

10.0 

31,250 

31,250 


IBM 4341 Mod. 12 

Proprietary 

— 

— 

14.70 

— 

3,690 

3,910 

Opt. 

IBM 4381 Mod. 2 

Proprietary 

— 

— 

— 

— 

4,504 

5,681 


IBM PC AT 

80286 

— 

— 

— 

— 

1,380 

1,380 


IBM PC AT 

80286 

— 

— 

6.00 

— 

531 

531 


IBM PC AT 

80286 

— 

— 

9.05 

— 

692 

1,484 


IBM PS/2 Mod. 70-A21 

80386 

64 

4 

25.00 

— 

8,650 

12,769 

L = SCO Unix ; H = AIX 

IBM RT PC 

Prop. RISC 

— 

— 

5.90 

4.5 

6,097 

6,500 

H = AIX 

IBM RT PC Mod. 135 

Prop. RISC 

— 

— 

7.30 

6.0 

10,770 

10,770 


Integr. Sol. Advantage2000 R2000 

64 

16 

16.70 

12.0 

18,920 

27,100 

L = unopt.; H = opt. 

Intel 386 ATS 

80386 

— 

— 

16.00 

— 

3,424 

3,424 


Intergraph Interpro 32C 

Clipper RISC 

— 

— 

30.00 

5.0 

4,855 

8,309 


Ironies IV-9001 

Am29000 

16 

8 

25.00 

17.0 

35,760 

35,760 


MIPS M/1000 

R2000 

128 

16 

15.00 

15.0 

15.100 

25,000 

L = unopt.; H = opt. 

MIPS M/120-3 

R2000 

128 

8 

12.50 

10.0 

23,300 

23,300 


MIPS M/120-5 

R2000 

128 

8 

16.70 

13.0 

18,700 

31,000 

L = unopt.; H = opt. 

MIPS M/2000 

R3000 

128 

32 20/25.00 16-20.0 

30,700 

47,400 

L = unopt.; H = opt. 

MIPS M/500 

R2000 

24 

8 

8.00 

8.0 

8,800 

14,200 

L = unopt.; H = opt. 

MIPS M/800 

R2000 

128 

8 

12.50 

12.5 

12,800 

21,300 

L = unopt.; H = opt. 

MIPS RC2030 

R2000 

64 

16 

16.70 

12.0 

19,100 

31,200 

L = unopt.; H = opt. 

Motorola SYS 1131 

68020 

16 

2 

16.70 

1.5 

3,246 

3,257 


Motorola SYS 1147 

68030 

0 

4 

20.00 

3.8 

6,334 

6,334 


Motorola SYS2300 

68020 

0 

4 

16.70 

1.5 

4,876 

4,876 


Motorola SYS2600 

68020 

16 

4 

16.70 

1.5 

4,566 

4,566 


Motorola SYS3300 

68030 

0 

4 

20.00 

3.8 

6,334 

6,334 


Motorola SYS3600 

68030 

0 

4 

25.00 

4.7 

8,826 

8,826 


Motorola SYS3640 

68030 

64 

4 

25.00 

5.3 

7,942 

8,900 


Motorola SYS3800 

68030 

64 

— 

33.00 

6.9 

9,239 

11,000 

H = Greenhills C V. 1.8.2 

Motorola SYS8600 

88100 

32 

8 

20.00 

17.0 

35,714 

35,714 


Multiflow Trace 7/200 

— 

— 

— 

— 

— 

14,195 

14,195 


NCR Tower 32/400 

68020 

8 

4 

16.70 

1.5 

3,628 

3,638 


NCR Tower 32/450 

68020 

8 

4 

25.00 

— 

4,941 

4,941 


Opus Systems 

88000 

32 

4-20 

20.00 

17.0 

41,166 

41,166 


Prime EXL 316 

80386 

— 

— 

16.00 

3.2 

7,112 

7,112 

Optimized 

Pyramid 90x 

Proprietary 

— 

— 

8.00 

2.5 

1,779 

3,333 

High = w/cache 

Pyramid 98x 

Proprietary 

— 

— 

10.00 

5.4 

3,627 

3,856 


Silicon Graphics Iris 

R2010 

24 

8 

12.50 

— 

18,416 

18,416 


Solbourne 4/600 

Sparc 

64 

16 

16.70 

7.0 

18,715 

18,715 


Sparcstation 1 

Sparc 

0 

4-24 

20.00 

12.5 

22,049 

22,049 


Sparcstation 330 

Sparc 

0 

— 

25.00 

16.0 

27,777 

27,777 


Sun 3/160C w/68881 

68020 

0 

4 

16.70 

2.0 

2.800 

3,850 

L = unopt.; H = opt. 

Sun 3/260 w/68881 

68020 

0 

8-24 

25.00 

4.0 

5,366 

7,142 


Sun 3/470 

68030 

0 

— 

33.00 

7.0 

11,748 

11,748 


Sun 3/50 

68020 

0 

4 

15.00 

1.5 

2,280 

2,695 


Sun 3/60 

68020 

0 

4 

16.70 

2.0 

4,295 

4,545 


Sun 3/80 

68030 

0 

4 

20.00 

3.0 

5,154 

5,154 


Sun 386i Mod. 150 

80386 

32 

8 

14.30 

3.5 

8,388 

8,388 

w/80387 FPU 

Sun 4/260 w/Weitek 

Sparc 

0 

16-32 

16.70 

10.0 

10,550 

19,900 

L = unopt.; H = opt. 

Sun Sparc 4/110 

Sparc 

— 

— 

14.30 

7.0 

14,109 

14,201 


Tandy 3000 

80286 

— 

— 

8.00 

— 

1.455 

1,543 


Tandy 6000 

68000 

— 

— 

8.00 

— 

1,286 

1,366 


Tektronix 4319 

68020 

0 

4 

20.00 

2.5 

7,581 

7,581 


Unisys 5000/90 

68020 

— 

— 

12.50 

1.0 

3,307 

3,311 
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Table 2. 

Summary of Digital Review benchmark test results. 


System/model 

Processor 

Cache 

(Kbytes) 

RAM 

(Mbytes) 

Frequency 

(MHz) 

MIPS 

rating 

Seconds 

Low High Comments 

Alliant FX/8 

Proprietary 




35.0 

1.48 

1.48 

Convex C-l XP 

Proprietary 

— 

— 

— 

4.5 

0.487 

0.487 

DEC |iVAX II/GPX 

Proprietary 

— 

— 

5.00 

0.9 

9.17 

9.17 

DEC VAX 11/780 

Proprietary 

— 

— 

5.00 

1.0 

6.75 

6.75 

DEC VAX 8600 

Proprietary 

— 

— 

— 

4.5 

2.32 

2.32 VMS V. 4.5 

DEC VAX 8650 

Proprietary 

— 

— 

— 

6.2 

1.584 

1.584 

DEC VAX 8700 

Proprietary 

— 

— 

— 

6.0 

1.469 

1.469 

DEC Vaxstation 3200 

Proprietary 

— 

— 

5.00 

2.0 

2.9 

2.9 

Elxsi 6420 

Proprietary 

— 

16 

20.00 

— 

1.193 

1.193 

MIPS M/1000 

R2000 

128 

16 

15.00 

15.0 

0.94 

0.99 

MIPS M/120-5 

R2000 

128 

8 

16.70 

13.0 

0.783 

0.783 

MIPS M/2000 

R3000 

128 

32 

20/25.00 

16-20.0 

0.553 

0.553 

MIPS M/500 

R2000 

24 

8 

8.00 

8.0 

1.86 

1.86 

MIPS M/800 

R2000 

128 

8 

12.50 

12.5 

1.2 

1.2 

Sun 4/260 

Sparc 

0 

16-32 

16.70 

10.0 

1.72 

2.09 Weitek FPU 

Sun Sparc 4/110 

Sparc 

— 

— 

14.30 

7.0 

2.32 

2.32 


Digital Review. Digital Review magazine has com¬ 
piled a set of benchmark routines that mixes 34 individ¬ 
ual integer and floating-point routines. The test itself 
stresses floating-point performance. This large bench¬ 
mark contains over 3,000 lines of Fortran code. The 
Digital Review benchmark does not perform any veri¬ 
fication of test results; these results usually appear as a 
list of the geometric mean of all tests, performed in 
seconds. On a secondary level, the test normalizes 
relative comparisons among various systems to the 
Digital MicroVAX II, which is equal to 1.0. These units 
are called MicroVAX units of processing (MVUPs). 
Table 2 lists only the raw benchmark-test results in 
seconds. 

Users have criticized this benchmark for its odd 
structure and unusual instruction mix that does not 
accurately mimic real-world program flow. Initializing 
the routines within the timing loops, rather than run¬ 
ning the actual benchmark code, consumes a large 
amount of time. This practice usually results in a low 
estimate of a system’s actual MVUP rating. More 
recently, Digital Review magazine has taken steps to 
revise its benchmark (now called CPU2) to correct 
some of these odd programming sequences. (Table 2 
reflects the previous version.) 

Dodoc. This 5,300-line Fortran program—which 
simulates the operations within a nuclear reactor— 


accurately tests instruction-fetch bandwidth and scalar 
floating-point performance. Compilers can vectorize 
very little of the code. The routine uses the Monte Carlo 
method of simulation in which an iterative process 
converges on an expected result. The routine was origi¬ 
nally designed as a check of both compiler and intrinsic 
(real-world) functions. Normalized results appear in 
terms of the ratio of CPU time needed to perform the 
test versus an arbitrary defined reference. 4 This R factor 
is normalized where 100 equals the performance of the 
IBM 370 Model 168. The algorithm calculates that/? = 
48,671/seconds of processor time. Larger R factors 
equate to higher system performance. 

Differences in floating-point accuracy (that is, 
single- or double-precision calculations), unique char¬ 
acteristics of mathematical libraries, and rounding er¬ 
rors contribute to how fast the algorithm converges on 
the expected answer. (See Table 3.) 

Khornerstone. Developed by Workstation Labora¬ 
tories, this benchmark yields a normalized rating on 
overall system performance using 22 separate tests. 5 
This suite of tests includes a mix of both public-domain 
(Dhrystone, Sieve, etc.) and proprietary benchmark 
routines. The result is a unit of measure called Khorner- 
stones per second. This set of routines measures char¬ 
acteristics of processor, floating-point, and disk per¬ 
formance. The Khornerstone test measures single-user 
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Table 3. 

Summary of Dodoc benchmark test results. 


System/model 

Processor 

Cache 

(Kbytes) 

RAM 

(Mbytes) 

Frequency 

(MHz) 

MIPS 

rating 

R Factor 

Low High 

Comments 

Alliant FX/80; MP = 4 

68020 

— 

32 

— 

— 

248 

248 

64 bits 

HP 9000 Mod. 370 

68030 

64 

8-48 

33.00 

7.0 

47 

68 

64 bits; L = 68882; 

H = Weitek 

Alliant FX-1 

Proprietary 

— 

— 

12.00 

— 

85 

85 

64 bits 

Alliant FX/8; MP = 8 

Proprietary 

— 

— 

— 

35.0 

101 

101 


Amdahl 470 V8 

— 

— 

— 

— 

— 

150 

150 

64 bits 

Amdahl 5860 

_ 

_ 

_ 

_ 

_ 

475 

475 

64 bits 

Apollo Series 10000 

Prop. RISC 

— 

— 

18.20 

16.0 

291 

460 

64 bits; H = Fortran 

V. 10.5r 15 

Bull DSP 90/x 

— 

— 

— 

— 

— 

371 

371 

64 bits 

CCI Power 6/32 

Proprietary 

— 

— 

— 

— 

50 

50 


CDC Cyber 990-E 

Proprietary 

— 

— 

— 

6.2 

592 

592 

64 bits 

Celerity 1260 

Proprietary 

— 

— 

— 

6.2 

48 

48 


Compaq 386/25 

80386 

32 

5 

25.00 

— 

73 

73 

64 bits; WTL3167 & DOS 

Convex C-120 

Proprietary 

— 

32 

— 

— 

103 

103 

64 bits 

Convex C-210; MP = 1 

Proprietary 

— 

— 

— 

— 

296 

296 

64 bits 

Cray X-MP 

Proprietary 

— 

— 

— 

— 

1,080 

1,080 


Cray X-MP/28 

Proprietary 

— 

— 

— 

— 

1,701 

1,701 

64 bits 

Data Gen. MV20000 

Proprietary 

16 

64 

— 

10.0 

99 

99 


Data Gen. MV40000 

Proprietary 

— 

— 

— 

— 

246 

246 

64 bits 

DEC VAX 11/780 VMS 

Proprietary 

— 

— 

5.00 

1.0 

26 

26 

64 bits 

DEC VAX 8600 

Proprietary 

— 

— 

— 

4.5 

91 

91 

VMS; 64 bits 

DEC VAX 8650 

Proprietary 

— 

— 

— 

6.2 

129 

129 


DEC VAX 8700 

Proprietary 

— 

— 

— 

6.0 

136 

136 

VMS; 64 bits 

Decstation 3100 

R2000 

128 

8 

16.70 

14.0 

255 

268 

64 bits 

Edge 1 

Proprietary 

— 

— 

— 

— 

53 

53 


Floating-Pt. 264 SJE 

Proprietary 

— 

— 

— 

— 

396 

396 

64 bits 

Fujitsu VP-200 

— 

— 

— 

— 

— 

915 

915 

64 bits 

HP 9000 Mod. 835S 

Prop. RISC 

128 

— 

15.00 

4.0 

214 

214 


HP 9000 Mod. 850S 

Prop. RISC 

— 

— 

13.70 

7.0 

201 

201 


HP 9000 Mod. 855S 

Prop. RISC 

256 

— 

25.00 

— 

266 

266 

64 bits 

Harris HCX-7 

Proprietary 

— 

— 

— 

7.7 

64 

64 


IBM 3081G 

_ 

_ 

_ 

— 

— 

181 

181 


IBM 308IK 

— 

— 

— 

— 

— 

236 

236 

64 bits 

IBM 3090 

— 

— 

— 

— 

10.0 

714 

714 

Scalar mode 

IBM 3090/200 

— 

— 

— 

— 

10.0 

847 

847 

64 bits 

IBM 4381 Mod. 2 

Proprietary 

— 

— 

— 

— 

90 

90 

64 bits 

Intergraph Interpro C245 

Clipper RISC 

— 

— 

33.00 

— 

40 

40 

64 bits 

Intergraph Interpro C370 

Clipper RISC 

— 

— 

40.00 

— 

72 

72 

64 bits 

MIPS M/1000 

R2000 

128 

16 

15.00 

15.0 

227 

227 


MIPS M/120-5 

R2000 

128 

8 

16.70 

13.0 

280 

289 

64 bits 

MIPS M/2000 

R3000 

128 

32 20/25.00 16-20.0 

438 

443 

64 bits 

MIPS M/500 

R2000 

24 

8 

8.00 

8.0 

88 

100 


MIPS M/800 

R2000 

128 

8 

12.50 

12.5 

190 

19 


MIPS RC2030 

R2000 

64 

16 

16.70 

12.0 

238 

238 

64 bits 

Silicon Graphics 4D/70 

R2010 

24 

8 

12.50 

— 

170 

170 

64 bits 

Sun 3/110 

68020 

0 

4 

16.70 

2.0 

17 

17 


Sun 3/160 W/FPA 

68020 

0 

4 

16.70 

2.0 

35 

35 

64 bits 

Sun 3/260 

68020 

0 

8-24 

25.00 

4.0 

22 

43 

L = 68881; H = Weitek 

Sun 3/260 w/FPA 

68020 

0 

8-24 

25.00 

4.0 

54 

54 

64 bits 

Sun 386i Mod. 250 

80386 

32 

16 

25.00 

— 

26 

26 

64 bits 

Sun 4/260 w/Weitek 

Sparc 

0 

16-32 

16.70 

10.0 

90 

97 

Weitek FPU; 64 bits 

Sun Sparc 4/110 

Sparc 

— 

— 

14.30 

7.0 

69 

75 
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Table 4. 

Summary of Khornerstone benchmark test results. 



Cache 

RAM 

Frequency 

MIPS 

Khornerstones/s 


System/model 

Processor 

(Kbytes) 

(Mbytes) (MHz) 

rating 

Low 

High 

Comments 

ALR FlexCache 25386 

80386 

64 

5 

25.00 


3,917 

3,917 

Unix OS 

Altos Series 2000 

80386 

— 

— 

16.00 

3.0 

3,038 

3,038 

Xenix 

Apollo Series 10000 

Prop. RISC 

— 

— 

18.20 

16.0 

54,768 

54,768 

Up to four CPUs 

Apollo Series DN3000 

68020 

— 

— 

12.50 

1.2 

2,469 

2,469 


Apollo Series DN4000 

68020 

0 

4 

25.00 

4.0 

5,296 

5,296 


Compaq 386 

80386 

0 

4 

16.00 

_ 

1,941 

1,941 


Compaq 386/20 

80386 

0 

4 

20.00 

— 

3,902 

4,418 

L = MS-DOS, H = Unix 

Compaq 386/25 

80386 

32 

5 

25.00 

— 

5,037 

7,417 

L = SCO Unix 

DEC Vaxstation 2000 

Proprietary 

— 

— 

5.00 

0.9 

2,181 

2,191 


Decstation 3100 

R2000 

128 

8 

16.70 

14.0 

5,206 

5,206 


Dell System 325 

80386 

32 

4 

25.00 


7,001 

7,001 

Unix 

HP 9000 Mod. 340 

68030 

— 

— 

16.70 

— 

4,880 

4,880 


HP 9000 Mod. 360 

68030 

0 

4-12 

25.00 

4.5 

6,639 

6,995 

L = 68882; H = FPA 

HP 9000 Mod. 835S 

Prop. RISC 

128 

— 

15.00 

4.0 

22,329 

22,329 


IBM PS/2 Mod. 155SX 

80386 

0 

2 

16.00 

— 

2,041 

2,041 


IBM PS/2 Mod. 70-121 

80386 

0 

4 

20.00 

_ 

6,800 

6,800 

AIX 

IBM PS/2 Mod. 70-A21 

80386 

64 

4 

25.00 

— 

3,967 

6,800 

L = SCO Unix; H = AIX 

IBM PS/2 Mod. 80 

80386 

— 

— 

16.00 

2.8 

2,265 

2,265 


IBM RT PC 

Prop. RISC 

— 

— 

5.90 

4.5 

5,459 

7,455 


IBM RT PC Mod. L135 

Prop. RISC 

— 

— 

7.30 

6.0 

7,296 

7,296 

W/FPA 

MIPS M/120-5 

R2000 

128 

8 

16.70 

13.0 

18,819 

23,218 


MIPS M/2000 

R3000 

128 

32 

20/25.00 16-20.0 

26,177 

26,177 


NCR Tower 32/400 

68020 

8 

4 

16.70 

1.5 

2,231 

2,949 

L = no cache; H = cache 

NCR Tower 2/450 

68020 

8 

4 

25.00 

— 

3,932 

3,932 


Solbourne 4/600 

Sparc 

64 

16 

16.70 

7.0 

14,515 

14,515 


Sun 3/160C 

68020 

0 

4 

16.70 

2.0 

2,982 

3,610 

68881 FPU 

Sun 3/260 

68020 

0 

8-24 

25.00 

4.0 

5,454 

6,767 


Sun 3/50 

68020 

0 

4 

15.00 

1.5 

2,732 

2,733 


Sun 3/60 

68020 

0 

4 

16.70 

2.0 

3,966 

3,966 


Sun 386i Mod. 150 

80386 

32 

8 

14.30 

3.5 

5,317 

5,317 

w/80387 FPU 

Sun 386i Mod. 1250 

80386 

32 

16 

25.00 

_ 

5,317 

5,317 


Sun 4/260 

Sparc 

0 

16-32 

16.70 

10.0 

11,440 

20,729 

L = Weitek FPU 

Sun Sparc 4/110 

Sparc 

— 

— 

14.30 

7.0 

7,265 

7,265 


Tektronix 4319 

68020 

0 

4 

20.00 

2.5 

4,114 

4,114 



loads on a system and therefore does not accurately 
measure multiuser performance. (See Table 4.) 

Linpack. This widely used benchmark provides a 
relative indication of system performance for engineer¬ 
ing and scientific applications. Argonne National Labo¬ 
ratories wrote Linpack, which it also maintains. The 
routine tests vector performance on individual scien¬ 
tific applications while it stresses cache performance. 


As a linear-equations package, Linpack emphasizes 
floating-point addition and multiplication. 6 The re¬ 
sults, measured in millions of floating-point operations 
per second (Mflops), are typically derived from a calcu¬ 
lation of a 100 x 100 submatrix of linear equations. 
Since the benchmark is quite large, it does not com¬ 
pletely fit into the cache space of most computers. 
Consequently, memory speed as well as floating-point 
processor performance can affect test results. Compil¬ 
ers can easily vectorize the routine, which results in 
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higher performance ratings on those vector processor 
systems. 

Unpack uses a set of general-purpose utilities called 
Basic Linear Algebra Subroutines (BLASs) to do the 
actual calculations. The benchmark subroutines can 
come in two forms: coded or Fortran. Some vendors use 
hand-coded, assembly-language versions of the library 
package, which typically yield improved benchmark 
performance results. The Fortran version comes with a 
set of standard algebraic subroutines (Fortran librar¬ 
ies). These Fortran BLASs come in two forms: unrolled 
and rolled. The body of the unrolled form contains an 
inner loop that is coded with multiple statements, while 
the rolled version contains a single statement that effec¬ 
tively performs the same operation. As would be ex¬ 
pected, the rolled version of the Fortran BLAS yields 
higher (faster) benchmark ratings. (Table 5 reflects 
single-precision results.) 

Livermore Fortran kernel. This benchmark pro¬ 
vides insight into the system performance of scientific 
applications in both vector and nonvector processor 


environments. Lawrence Livermore National Labora¬ 
tories developed the code. Since its work is dominated 
by large scientific calculations that can be vectorized, 
the laboratory developed this benchmark to evaluate 
large supercomputer systems. The benchmark has since 
migrated to the rest of the computer industry—even to 
personal computers. 

The actual test consists of 24 sections of code taken 
from applications typically run by the laboratory. These 
kernels inhabit a larger benchmark driver that runs the 
routines several times, using different input data each 
time. The driver also checks on the accuracy and timing 
of the results and produces a statistical report of the 
test. 2 The results appear as 24 separate numbers (one for 
each kernel) as Mflop measurements for three different 
vector lengths, which total 72 results. Various statisti¬ 
cal means (arithmetic, geometric, harmonic) provide 
insight into general-system performance. An analysis 
performed by Livermore Labs suggests that each statis¬ 
tical mean corresponds to a level of system vectoriza- 
tion. 2 In terms of vectorization, the harmonic, geomet¬ 
ric, and arithmetic means approximate 40, 70, and 90 
percent. These three results are often interpreted as 


Table 5. 

Summary of single-precision Linpack benchmark test results. 

System/model 

Processor 

Cache 

(Kbytes) 

RAM 

(Mbytes) 

Frequency 

(MHz) 

MIPS 

rating 

Mflops 

Low High Comments 

Altos Series 2000 

80386 

— 

— 

16.00 

3.0 

0.11 

0.11 

Apollo DN660 

Proprietary 

— 

— 

— 

1.0 

0.1 

0.1 

Apollo Series DN3000 

68020 

— 

— 

12.50 

1.2 

0.07 

0.073 

Apollo Series DN4000 

68020 

0 

4 

25.00 

4.0 

0.14 

0.14 

Celerity 1200 

Proprietary 

— 

— 

— 

2.3 

300.0 

300.0 

Compaq 386/20 w/80387 

80386 

0 

4 

20.00 

_ 

0.14 

0.26 

Compaq 386/20 w/Weitek 80386 

0 

4 

20.00 

— 

0.64 

0.64 

Data Gen. MV 10000 

Proprietary 

— 

— 

— 

2.5 

390.0 

390.0 

DEC VAX 11/785 

Proprietary 

— 

— 

7.50 

1.5 

0.23 

0.23 w/FPA 

DEC Vaxstation 2000 

Proprietary 

— 

— 

5.00 

0.9 

0.162 

0.162 

HP 9000 Mod. 340 

68030 

— 

_ 

16.70 

_ 

0.167 

0.168 

HP 9000 Mod. 360 

68030 

0 

4-12 

25.00 

4.5 

0.24 

0.66 L = 68882; H = FPA 

HP 9000 Mod. 835SRX 

Prop. RISC 

128 

— 

15.00 

4.0 

2.29 

2.29 

IBM PC AT 

80286 

— 

— 

— 

— 

0.013 

0.013 

IBM RT PC 

Prop. RISC 

— 

— 

5.90 

4.5 

0.11 

0.36 L = 68881; H = FPA 

IBM RT PC Mod. 135 

Prop. RISC 

— 

— 

7.30 

6.0 

0.44 

0.44 w/FPA 

Motorola MVME181-2 

88100 

32 

— 

25.00 

— 

1.8 

1.82 

Motorola MVME188SP-1 

88100 

128 

16 

20.00 

17.0 

1.71 

1.77 

Motorola SYS8800 

88100 

128 

16 

20.00 

17.0 

1.71 

1.77 

NCR Tower 32/400 

68020 

8 

4 

16.70 

1.5 

0.095 

0.1 L = no cache; H = cache 

NCR Tower 32/450 

68020 

8 

4 

25.00 

_ 

0.217 

0.217 

Pyramid 90x/FP 

Proprietary 

— 

— 

8.00 

2.5 

200.0 

200.0 

Sun 3/50 

68020 

0 

4 

15.00 

1.5 

0.092 

0.092 

Sun 386i Mod. 150 

80386 

32 

8 

14.30 

3.5 

0.25 

0.25 w/80387 FPU 

Tektronix 4319 

68020 

0 

4 

20.00 

2.5 

0.105 

0.105 


October 1989 37 







Benchmarking 


Table 6. 

Summary of single-precision Livermore Fortran kernel benchmark test results. 


System/model 

Processor 

Cache 

(Kbytes) 

RAM 

(Mbytes) 

Frequency 

(MHz) 

MIPS 

rating 

Mflops 

Low High 

Comments 

Acer 1100/25 

80386 

32 

4 

25.00 


0.097 

0.12 

L = DOS/X; H = Unix 

Alliant FX-1. Scalar 

Proprietary 

— 

— 

12.00 

— 

0.66 

0.66 


Alliant FX-1, Vector 

Proprietary 

— 

— 

12.00 

— 

0.6 

0.6 


Alliant FX/8; MP = 8 

Proprietary 

— 

— 

— 

35.0 

1.3 

1.31 


ALR FlexCache 25386 

80386 

64 

5 

25.00 

— 

0.08 

0.12 

L = DOS/X; H = Unix 

AMI Mark II 

80386 

64 

_ 

33.00 


0.3 

0.3 

w/Weitek 3167 

AST 386/33 

80386 

0 

— 

33.00 

— 

0.13 

0.275 

L = 80387; H = Weitek 3167 

AST 386C-390 

80386 

64 

4 

25.00 

— 

0.063 

0.08 

L = DOS/X; H =Unix 

Cheetah cAT 386 

80386 

0 

5 

20.00 

— 

0.05 

0.05 


Compaq 386/25 w/80387 

80386 

32 

5 

25.00 

— 

0.09 

0.133 

L = DOS/X; H = DOS/X 

Compaq 386/25 

80386 

32 

5 

25.00 

— 

0.22 

0.22 

w/Weitek 

Convex C-l Scalar 

Proprietary 

— 

— 

— 

4.5 

1.11 

1.11 


Convex C-l Vector 

Proprietary 

— 

— 

— 

4.5 

1.27 

1.27 


DEC VAX 11/780 4.3 BSD 

Proprietary 

— 

— 

5.00 

1.0 

0.18 

0.18 


DEC VAX 11/780 VMS 

Proprietary 

— 

— 

5.00 

1.0 

0.28 

0.28 


DEC VAX 8700 VMS 

Proprietary 

— 

— 

— 

6.0 

1.26 

1.26 


Dell System 310 

80386 

32 

3 

20.00 

— 

0.07 

0.09 

L = DOS/X; H = Unix 

Dell System 325 

80386 

32 

4 

25.00 

— 

0.1 

0.12 

L = DOS/X; H = Unix 

Elxsi 6420 

Proprietary 

— 

16 

20.00 

— 

1.31 

1.31 


Everex Step 386/25 

80386 

64 

4 

25.00 

— 

0.09 

0.11 

L = DOS/X; H = Unix 

Fivestar 386/20 

80386 

64 

4 

20.00 

_ 

0.073 

0.091 

L = DOS/X; H = Unix 

HP Vectra RS/25C 

80386 

32 

4 

25.00 

— 

0.088 

0.109 

L = DOS/X; H = DOS/X 

Hertz 386/25 

80386 

64 

4 

25.00 

— 

0.1 

0.11 

L = DOS/X; H = Unix 

IBM PS/2 Mod. 70-121 

80386 

0 

4 

20.00 

— 

0.06 

0.08 

L = DOS/X; H = Unix 

IBM PS/2 Mod. 70-A21 

80386 

64 

4 

25.00 

— 

0.09 

0.11 

L = DOS/X; H = Unix 

Micro Express 386/25 

80386 

64 

4 

25.00 

_ 

0.101 

0.121 

L = DOS/X; H = DOS/X 

MIPS M/1000 

R2000 

128 

16 

15.00 

15.0 

2.02 

2.29 


MIPS M/120-5 

R2000 

128 

8 

16.70 

13.0 

2.58 

2.85 


MIPS M/2000 

R3000 

128 

32 20/25.00 16-20.0 

3.8 

4.24 


MIPS M/500 

R2000 

24 

8 

8.00 

8.0 

0.97 

0.97 


MIPS RC2030 

R2000 

64 

16 

16.70 

12.0 

2.82 

2.82 


Proteus 4400GL 

80386 

64 

4 

25.00 

— 

0.101 

0.121 

L = DOS/X; H = DOS/X 

Rupp 386/20 

80386 

0 

5 

20.00 

— 

0.056 

0.056 

Sun 3/160 

68020 

0 

4 

16.70 

2.0 

0.46 

0.46 

w/FPA 

Sun 3/260 

68020 

0 

8-24 

25.00 

4.0 

0.65 

1.04 


Sun 3/60 

68020 

0 

4 

16.70 

2.0 

0.14 

0.14 


Sun 386i Mod. 250 

80386 

32 

16 

25.00 

— 

0.08 

0.08 


Sun 4/260 

Sparc 

0 

16-32 

16.70 

10.0 

1.03 

1.04 

w/Weitek 

Sun Sparc 4/110 

Sparc 

— 

— 

14.30 

7.0 

0.77 

0.77 


Tandon 386/20 

80386 

64 

4 

20.00 

— 

0.08 

0.09 

L = DOS/X; H = Unix 

Tandy 4000 LX 

80386 

0 

16 

20.00 

— 

0.07 

0.08 

L = DOS/X; H = Unix 


separate benchmark tests. (Table 6 reflects single¬ 
precision results at 40-percent vectorization.) 

SPICE. The general-purpose Simulation Program 
with Integrated Circuit Emphasis came from the Uni¬ 
versity of California at Berkeley. This benchmark 
makes heavy use of both integer and double-precision. 


floating-point calculations (the floating-point opera¬ 
tions are not vector oriented). Because it is quite large, 
the program is a good test of system instruction and 
data-cache performance. SPICE accepts a circuit de¬ 
scription as input and simulates the design. The user 
can monitor currents and voltages at various circuit 
locations. 
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Table 7. 

Summary of SPICE benchmark test results. 


System/model 

Processor 

Cache 

(Kbytes) 

RAM 

(Mbytes) 

Frequency 

(MHz) 

MIPS 

rating 

Seconds 

Low High 

Comments 

Apollo Series 10000 

Prop. RISC 

_ 

_ 

18.20 

16.0 

5.0 

5.1 

Up to four CPUs 

DEC VAX 11/780 

Proprietary 

— 

— 

5.00 

1.0 

154.4 

154.4 

w/FPA; Unix 4.2 BSD 

DEC VAX 8600 

Proprietary 

— 

— 

— 

4.5 

28.0 

28.0 

Unix 

Decstation 3100 

R2000 

128 

8 

16.70 

14.0 

32.13 

32.13 


Sun 3/260 

68020 

0 

8-24 

25.00 

4.0 

31.0 

31.0 


Sun 4/260 

Sparc 

0 

16-32 

16.70 

10.0 

19.0 

19.0 


Sun 4/260 

Sparc 

0 

16-32 

16.70 

10.0 

61.78 

61.78 

w/Weitek 

Sun Sparc 4/110 

Sparc 

— 

— 

14.30 

7.0 

79.87 

79.87 



UC Berkeley and several system vendors have dis¬ 
tributed various input data packs for simulation of 
different types of circuits. The user must take care to 
ensure that benchmark results from different systems 
have used the same circuit simulations. The number of 
seconds required to perform the simulation constitutes 
the test-result measurements. (Table 7 reflects data 
from 2G6 circuit simulations only.) 

Stanford Integer. John Hennessey of Stanford 
University compiled this benchmark test. Written in the 
C programming language, the suite consists of a series 
of small programs that use algorithms to solve real- 
world problems. 

Some of these small programs include the Towers of 
Hanoi and the Eight Queens puzzles, multiplication of 


integer matrices, and the quick and bubble sorts. Yet 
another routine inserts and recursively searches a bi¬ 
nary tree. Test measurements consist of the geometric 
mean of all results displayed in either seconds or milli¬ 
seconds (See Table 8.) 

Stanford Floating Point. The program uses tight 
loops and a large proportion of floating-point code to 
calculate the results. 2 The routines’ susceptibility to 
code optimization by high-quality compilers affects the 
results. The suite consists of the fast Fourier transform 
(FFT) and matrix multiplication (MM) tests. The first 
test typically computes a 256-point, single-precision 
FFT 20 times. The second test multiplies two 40 x 40 
single-precision matrices. The results appear in either 
seconds or milliseconds. (See Table 8.) 


Table 8. 

Summary of Stanford benchmark test results. 


System/model 

Processor 

Cache RAM 

(Kbytes) (Mbytes) 

Frequency 

(MHz) 

MIPS 

rating 

Milliseconds 

Low High Comments 

Stanford Integer 








DEC VAX 11/780 

Proprietary 

— 

— 

5.00 

1.0 

1,550 

2,140 

DEC VAX 8550 

Proprietary 

— 

— 

— 

6.4 

260 

350 

DEC VAX 8600 

Proprietary 

— 

— 

— 

4.5 

620 

860 

DEC VAX 8810 

Proprietary 

— 

— 

22.22 

12.0 

6.5 

6.5 

Decstation 3100 

R2000 

128 

8 

16.70 

14.0 

115 

115 

MIPS M/1000 

R2000 

128 

16 

15.00 

15.0 

130 

130 

MIPS M/120-5 

R2000 

128 

8 

16.70 

13.0 

112 

118 

MIPS M/2000 

R3000 

128 

32 20/25.00 16-20.0 

67 

75 

MIPS M/500 

R2000 

24 

8 

8.00 

8.0 

260 

260 

MIPS M/800 

R2000 

128 

8 

15.00 

15.0 

160 

160 

MIPS RC2030 

R2000 

64 

16 

16.70 

12.0 

110 

110 

Sun 3/160 

68020 

0 

4 

16.70 

2.0 

530 

530 
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Table 8. 

Summary of Stanford benchmark test results (continued). 

System/model 

Processor 

Cache 

(Kbytes) 

RAM 

(Mbytes) 

Frequency MIPS 
(MHz) rating 

Milliseconds 

Low High 

Comments 

Sun 3/260 

68020 

0 

8-24 

25.00 

4.0 

360 

360 


Sun 4/260 

Sparc 

0 

16-32 

16.70 

10.0 

150 


150 w/Weitek 

Sun Sparc 4/110 

Sparc 

— 

— 

14.30 

7.0 

222 

222 


Stanford Floating Point 









DEC pVAX II w/Ultrix 

Proprietary 

— 

— 

5.00 

0.9 

3,000 

3,000 

Single-precision MM 

DEC gVAX II w/Ultrix 

Proprietary 

— 

— 

5.00 

0.9 

4,900 

4,900 

256-point FFT 

DEC pVAX II w/VMS 

Proprietary 

— 

— 

5.00 

0.9 

1,900 

1,900 

Single-precision MM 

DEC pVAX II w/VMS 

Proprietary 

— 

— 

5.00 

0.9 

3,300 

3,300 

256-point FFT 

DEC VAX 11/780 

Proprietary 

— 

— 

5.00 

1.0 

2,040 

2.310 

Single-precision MM 

DEC VAX 11/780 

Proprietary 

— 

— 

5.00 

1.0 

3,870 

3,970 

256-point FFT 

DEC VAX 8600 

Proprietary 

— 

— 

— 

4.5 

1,370 

1,370 

Single-precision MM 

DEC VAX 8600 

Proprietary 

— 

— 

— 

4.5 

1,420 

1,420 

256-point FFT 

MIPS M/1000 

R2000 

128 

16 

15.00 

15.0 

120 

120 

Single-precision MM 

MIPS M/1000 

R2000 

128 

16 

15.00 

15.0 

170 

170 

256-point FFT 

MIPS M/120-5 

R2000 

128 

8 

16.70 

13.0 

54 

70 

Single-precision MM 

MIPS M/120-5 

R2000 

128 

8 

16.70 

13.0 

95 

100 

256-point FFT 

MIPS M/2000 

R3000 

128 

32 

20/25.00 

16-20.0 

35 

43 

Single-precision MM 

MIPS M/2000 

R3000 

128 

32 

20/25.00 

16-20.0 

64 

64 

256-point FFT 

MIPS M/500 

R2000 

24 

8 

'8.00 

8.0 

260 

260 

Single-precision MM 

MIPS M/500 

R2000 

24 

8 

8.00 

8.0 

340 

340 

256-point FFT 

MIPS M/800 

R2000 

128 

8 

15.00 

15.0 

120 

130 

Single-precision MM 

MIPS M/800 

R2000 

128 

8 

15.00 

15.0 

170 

200 

256-point FFT 

MIPS RC2030 

R2000 

64 

16 

16.70 

12.0 

55 

55 

Single-precision MM 

MIPS RC2030 

R2000 

64 

16 

16.70 

12.0 

98 

98 

256-point FFT 

Sun 3/160 

68020 

0 

4 

16.70 

2.0 

498 

500 

Single-precision MM 

Sun 3/160 

68020 

0 

4 

16.70 

2.0 

840 

840 

256-point FFT 

Sun 3/260 

68020 

0 

8-24 

25.00 

4.0 

250 

380 

Single-precision MM 

Sun 3/260 

68020 

0 

8-24 

25.00 

4.0 

460 

600 

256-point FFT 

Sun 3/60 

68020 

0 

4 

16.70 

2.0 

870 

870 

Single-precision MM 

Sun 3/60 

68020 

0 

4 

16.70 

2.0 

1,550 

1,550 

256-point FFT 

Sun 4/260 w/Weitek 

Sparc 

0 

16-32 

16.70 

10.0 

250 

263 

Single-precision MM 

Sun 4/260 w/Weitek 

Sparc 

0 

16-32 

16.70 

10.0 

460 

560 

256-point FFT 


Whetstone. This benchmark is a synthetic mix of 
integer and floating-point calculations, transcendental 
functions, conditional jumps, function calls, and array 
indexing. The code usually comes in a Fortran version. 
Considered a classic, the benchmark was designed to 
represent typical scientific programs. 

The original version emerged from an analysis of 949 
Algol-60 programs. 7 The structure of Whetstone defies 
vectorization by many optimizing compilers. Results 


display in thousands or millions of Whetstone inter¬ 
preter instructions per second (KWhips or MWhips, 
sometimes referred to as MegaWhetstones/s). 

This benchmark has fallen out of favor with the 
computer industry in recent years. So many different 
versions of Whetstone (written in either Fortran or the 
C programming language) make correlating the data 
derived from two different sources almost impossible. 
(Table 9 reflects single-precision results.) 
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Table 9. 

Summary of single-precision Whetstone benchmark test results. 

System/model 

Processor 

Cache 

(Kbytes) 

RAM 

(Mbytes) 

Frequency 

(MHz) 

MIPS 

rating 

MWhips 

Low High 

Comments 

Acer 1100/25 

80386 

32 

4 

25.00 

— 

2.05 

2.42 

L = Unix; H = DOS/X 

Alliant FX/8; MP = 8 

Proprietary 

— 

— 

— 

35.0 

4.9 

4.9 


ALR FlexCache 25386 

80386 

64 

5 

25.00 

— 

2.03 

2.098 


Altos Series 2000 

80386 

— 

— 

16.00 

3.0 

1.18 

1.18 


AMI Mark II 

80386 

64 

— 

33.00 

— 

7.11 

7.11 

w/Weitek 3167 

Apollo 5X0T 

68020 

— 

— 

20.00 

3.4 

7.74 

2.2 

L = no FPU; H = FPU 

Apollo DN660 

Proprietary 

— 

— 

— 

1.0 

0.69 

0.69 


Apollo Series 10000 

Prop. RISC 

— 

— 

18.20 

16.0 

16.97 

16.97 

Up to four CPUs 

Apollo Series DN3000 

68020 

— 

— 

12.50 

1.2 

0.716 

0.78 


Apollo Series DN4000 

68020 

0 

4 

25.00 

4.0 

1.89 

2.17 


Apollo Series DN4500 

68030 

64 

16 

33.00 

7.0 

3.23 

3.23 


Apple Macintosh II 

68020 

— 

— 

15.70 

— 

0.52 

0.05 


Apple Macintosh IIx 

68030 

0 

4 

16.70 

— 

0.73 

0.73 


Apple Macintosh IIxc 

68030 

0 

4 

16.70 

— 

0.73 

0.73 


Apple Macintosh SE 

68020 

0 

— 

— 

— 

0.05 

0.05 


Apple Macintosh SE/30 

68030 

0 

4 

16.70 

— 

0.73 

0.73 


AST 386/33 

80386 

0 

— 

33.00 

— 

3.66 

7.05 

L = 80387; H = Weitek 3167 

AST 386C-390 

80386 

64 

4 

25.00 

— 

3.66 

7.05 

L = OS/2; H = DOS/X 

CCI Power 7/64 

Proprietary 

— 

— 

— 

— 

14.07 

14.07 


Celerity 1200 

Proprietary 

—- 

— 

— 

2.3 

0.23 

0.23 


Celerity 1260 

Proprietary 

— 

— 

— 

6.2 

0.88 

0.88 


Cheetah cAT 386 

80386 

0 

5 

20.00 

— 

1.11 

1.53 

L = Unix; H = DOS/X 

Compaq 386/20 w/80387 

80386 

0 

4 

20.00 

— 

1.72 

1.89 


Compaq 386/20 w/Weitek 

80386 

0 

4 

20.00 

— 

4.12 

4.31 


Compaq 386/25 w/80387 

80386 

32 

5 

25.00 

— 

2.0 

2.4 

L = Unix; H = DOS/X 

Compaq 386/25 w/Weitek 

80386 

32 

5 

25.00 

— 

5.22 

5.22 


Data Gen. MV 10000 

Proprietary 

— 

— 

— 

2.5 

2.8 

2.8 


Data Gen. MV 1400 DC 

Proprietary 

0 

12 

6.25 

— 

0.97 

0.97 


Data Gen. MV 15000-10 

Proprietary 

16 

64 

11.80 

4.0 

4.6 

4.6 


Data Gen. MV 15000-20 

Proprietary 

16 

64 

11.80 

8.0 

7.13 

7.13 


Data Gen. MV15000-8 

Proprietary 

16 

64 

11.80 

4.0 

3.06 

3.06 


Data Gen. MV2000 DC 

Proprietary 

0 

12 

6.25 

— 

0.9 

0.97 


Data Gen. MV20000 

Proprietary 

16 

64 

— 

10.0 

5.9 

5.9 


Data Gen. MV2500 DC 

Proprietary 

0 

24 

20.00 

— 

1.62 

1.62 


Data Gen. MV7800 

Proprietary 

0 

14 

3.13 

— 

1.13 

1.13 


Data Gen. MV7800 DC 

Proprietary 

0 

14 

3.13 

— 

1.13 

1.13 


Data Gen. MV7800 DCX 

Proprietary 

0 

14 

4.50 

— 

1.58 

1.58 


Data Gen. MV7800 XP 

Proprietary 

0 

14 

4.50 

— 

1.58 

1.58 


DEC pVAX 3500 w/VMS 

Proprietary 

— 

— 

11.10 

3.5 

0.47 

0.47 


DEC pVAX II w/Ultrix 

Proprietary 

— 

— 

5.00 

0.9 

0.4 

0.4 


DEC pVAX II w/VMS 

Proprietary 

— 

— 

5.00 

0.9 

0.93 

0.93 


DEC VAX 11/780 

Proprietary 

— 

— 

5.00 

1.0 

1.313 

1.313 


DEC VAX 11/780 4.3 BSD 

Proprietary 

— 

— 

5.00 

1.0 

0.5 

1.08 


DEC VAX 11/785 

Proprietary 

— 

— 

7.50 

1.5 

1.8 

1.8 

w/FPA 

DEC VAX 8600 

Proprietary 

— 

— 

— 

4.5 

4.6 

4.6 


DEC VAX 8650 

Proprietary 

— 

— 

— 

6.2 

6.1 

6.9 


DEC VAX 8700 

Proprietary 

— 

— 

— 

6.0 

5.9 

6.67 


DEC Vaxstation 2000 

Proprietary 

— 

— 

5.00 

0.9 

0.47 

0.47 


Decstation 3100 

R2000 

128 

8 

16.70 

14.0 

11.5 

13.0 


Definicon Sparc 1 

Sparc 

— 

4 

20.00 

— 

4.4 

4.4 


Dell System 310 

80386 

32 

3 

20.00 

— 

1.6 

2.0 

L = Unix; H = DOS/X 

Dell System 325 

80386 

32 

4 

25.00 

— 

2.0 

2.4 

L = Unix; H = DOS/X 

Everex Step 386/25 

80386 

64 

4 

25.00 

— 

2.0 

2.4 

L = Unix; H = DOS/X 

Fivestar 386/20 

80386 

64 

4 

20.00 

— 

1.54 

1.83 

L = Unix; H = DOS/X 

HP 9000 Mod. 340 

68030 

— 

— 

16.70 

— 

1.713 

1.73 


HP 9000 Mod. 360 

68030 

0 

4-12 

25.00 

4.5 

2.1 

3.0 

L = 68882; H = FPA 
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Table 9. 

Summary of single-precision Whetstone benchmark test results (continued). 

System/model 

Processor 

Cache 

(Kbytes) 

RAM 

(Mbytes) 

Frequency 

(MHz) 

MIPS 

rating 

MWhips 

Low High 

Comments 

HP 9000 Mod. 360 

68030 

0 

4-12 

25.00 

4.5 

2.1 

3.0 

L = 68882; H = FPA 

HP 9000 Mod. 370 

68030 

64 

8-48 

33.00 

7.0 

3.41 

3.41 


HP 9000 Mod. 825S 

Prop, RISC 

16 

— 

12.50 

3.0 

3.5 

3.58 


HP 9000 Mod. 835S 

Prop. RISC 

128 

— 

15.00 

4.0 

9.0 

9.33 


HP 9000 Mod. 840S 

Prop. RISC 

128 

24 

8.00 

4.5 

3.1 

3.1 


HP 9000 Mod. 850S 

Prop. RISC 

— 

— 

13.70 

7.0 

4.2 

8.1 


HP Vectra RS/25C 

80386 

32 

4 

25.00 

— 

1.95 

2.3 

L = Unix; H = DOS/X 

Harris HCX-7 

Proprietary 

— 

— 

— 

7.7 

7.1 

7.1 


Hertz 386/25 

80386 

64 

4 

25.00 

— 

2.0 

2.3 

L = Unix; H = DOS/X 

IBM 308ID 

— 

— 

— 

— 

— 

5.85 

5.85 


IBM 3090 

— 

— 

— 

— 

10.0 

18.0 

18.0 


IBM PS/2 Mod. 70-121 

80386 

0 

4 

20.00 

— 

1.35 

1.63 

L = Unix; H = DOS/X 

IBM PS/2 Mod. 70-A21 

80386 

64 

4 

25.00 

— 

1.94 

2.538 

L = Unix; H = AIX 

IBM RT PC 

Prop. RISC 

— 

— 

5.90 

4.5 

0.81 

1.64 

L = 68881; H = FPA 

IBM RT PC Mod. 135 

Prop. RISC 

— 

— 

7.30 

6.0 

2.12 

2.12 

w/FPA 

Integr. Sol. Advantage2000 R2000 

64 

16 

16.70 

12.0 

11.4 

11.4 


Intergraph Interpro 32C 

Clipper RISC 

— 

— 

30.00 

5.0 

2.98 

2.98 


Micro Express 386/25 

80386 

64 

4 

25.00 

— 

2.03 

2.44 

L = Unix; H = DOS/X 

MIPS M/1000 

R2000 

128 

16 

15.00 

15.0 

10.28 

10.5 


MIPS M/120-3 

R2000 

128 

8 

12.50 

10.0 

9.1 

9.1 


MIPS M/120-5 

R2000 

128 

8 

16.70 

13.0 

11.4 

12.1 


MIPS M/2000 

R3000 

128 

32 20/25.00 16-20.0 

13.8 

18.0 

L = 20 MHz; H = 25 MHz 

MIPS M/500 

R2000 

24 

8 

8.00 

8.0 

5.43 

5.43 

R2360 FPU 

MIPS M/800 

R2000 

128 

8 

15.00 

15.0 

8.57 

8.57 


MIPS RC2030 

R2000 

64 

16 

16.70 

12.0 

12.1 

12.1 


Motorola MVME141-1 

68030 

64 

4 

25.00 

5.3 

0.69 

1.88 


Motorola MVME141-2 

68030 

64 

4 

33.00 

6.9 

2.5 

2.5 


Motorola MVME147 

68030 

0 

4 

20.00 

3.8 

1.43 

1.51 


Motorola MVME147-1 

68030 

0 

4 

25.00 

4.7 

1.79 

1.96 


Motorola MVME180 

88100 

32 

— 

20.00 

17.0 

12.7 

18.2 


Motorola MVME181-2 

88100 

32 

— 

25.00 

— 

15.9 

22.8 


Motorola SYS 1147 

68030 

0 

4 

20.00 

3.8 

1.43 

1.43 


Motorola SYS2300 

68020 

0 

4 

16.70 

1.5 

0.93 

0.93 


Motorola SYS2600 

68020 

16 

4 

16.70 

1.5 

0.87 

0.87 


Motorola SYS3300 

68030 

0 

4 

20.00 

3.8 

1.43 

1.43 


Motorola SYS3600 

68030 

0 

4 

25.00 

4.7 

1.79 

1.79 


Motorola SYS3640 

68030 

64 

4 

25.00 

5.3 

1.79 

1.79 


NCR Tower 32/400 

68020 

8 

4 

16.70 

1.5 

0.864 

0.933 

L = no cache; H = cache 

NCR Tower 32/450 

68020 

8 

4 

25.00 

— 

1.65 

1.65 


Opus Systems 

88000 

32 

4-20 

20.00 

17.0 

18.58 

18.58 


Opus Systems 

88000 

32 

4-20 

20.00 

17.0 

9.43 

9.43 


Prime EXL 316 

80386 

— 

— 

16.00 

3.2 

3.3 

3.3 


Proteus 4400GL 

80386 

64 

4 

25.00 

— 

2.03 

. 2.44 

L = Unix; H = DOS/X 

Rupp 386/20 

80386 

0 

5 

20.00 

— 

1.11 

1.53 

L = Unix; H = DOS/X 

Silicon Graphics Iris 

R2010 

24 

8 

12.50 

— 

7.35 

7.35 


Solbourne 4/600 

Sparc 

64 

16 

16.70 

7.0 

6.65 

6.65 


Sun 3/160C w/68881 

68020 

0 

4 

16.70 

2.0 

1.03 

2.6 

L = 68881; H = FPA 

Sun 3/260 

68020 

0 

8-24 

25.00 

4.0 

1.25 

5.3 

L = 68881; H = FPA 

Sun 3/50 

68020 

0 

4 

15.00 

1.5 

0.71 

0.936 

w/68881 

Sun 3/60 

68020 

0 

4 

16.70 

2.0 

1.27 

1.42 


Sun 386i Mod. 150 

80386 

32 

8 

14.30 

3.5 

1.92 

1.92 

w/80387 FPU 

Sun 386i Mod. 250 

80386 

32 

16 

25.00 

— 

1.63 

1.923 


Sun 4/260 w/Weitek 

Sparc 

0 

16-32 

16.70 

10.0 

4.6 

5.66 

Weitek FPU 

Sun Sparc 4/110 

Sparc 

— 

— 

14.30 

7.0 

4.22 

4.28 


Tandon 386/20 

80386 

64 

4 

20.00 

— 

1.5 

1.9 

L = Unix; H = DOS/X 

Tandy 4000 LX 

80386 

0 

16 

20.00 


1.43 

1.71 

L = Unix; H = DOS/X 
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T he list of public-domain, licensed third-party, and 
proprietary benchmark tests in use today seems 
endless. This plethora, combined with the various 
benchmark revisions, testing procedures, and analysis 
methods, complicates the task of comparing general 
system performance. Understanding simple concepts 
like what a benchmark does—and how the results are 
produced—provides a valuable first step in the right 
direction. H 
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A dvances in semiconductor and very large scale integration technol¬ 
ogy have greatly decreased the cost of computer hardware. This 
factor has boosted research in areas that range from high-perform¬ 
ance computing to distributed processing. As computer systems become 
correspondingly more complex, performance evaluation also becomes more 
essential—and increasingly complicated. 

Performance studies serve three purposes: the design, selection, and im¬ 
provement of systems. For these studies we use tools that either model or 
monitor the target system. Modeling—whether it is analytical or simu¬ 
lated—occurs at the design stage. Monitoring, on the other hand, aids the 
study of the dynamic behavior of a system. A monitor observes the system 
during its normal operation and collects runtime data. Analysis of this data 
provides accurate performance information for system analysis, verifica¬ 
tion, and management. Performance management, for example, ensures that 
preassigned specifications are met and maintained at an acceptable level of 
efficiency. 

In a multiprocessor environment, one must evaluate processing elements 
individually and collectively. Modeling a multiprocessor environment is an 
extremely complex task. It is much more effective to design a proper 
monitor to measure performance. 

Terms 

One can categorize monitors in terms of instrumentational, functional, or 
architectural issues. 

Software vs. hardware. Software monitors consist of programs that 
detect the states of a system. These monitors also comprise sets of instruc¬ 
tions called software probes that detect events. 1 

Hardware monitors, on the other hand, are electronic devices connected 
to specific system points via probes. These probes detect signals that char¬ 
acterize the phenomena to be observed. While software monitors function 
at a logical level, their hardware counterparts work at a physical level. 

The interference in a target system caused by the introduction of the 
monitoring device is called an artifact. 2 To ensure an unbiased indication of 
performance, one should ascertain an artifact’s magnitude and remove it 
from the measurements. 
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Figure 1. The Ted architecture. 


Passive vs. active. The basic function of a monitor is 
passive in that it observes, collects, stores, and outputs 
data to the external world. An active monitor, however, 
further analyzes the data and takes corrective action 
based on this analysis. It dynamically controls the 
operation of the entire system to maintain performance 
specifications. 

Centralized vs. distributed. A centralized monitor 
taps into critical points that provide primary informa¬ 
tion about the system. These points typically serve as 
the communication data paths. However, the selection 
of a location in the system that is central to all process¬ 
ing activity is a design issue. Since a centralized loca¬ 
tion is not generally feasible, our device collects only a 
selected set of data. 

A distributed monitor associates multiple monitors 
with corresponding processors. Each monitor periodi¬ 
cally sends the data it captures to a central analyzer. In 
a master-slave environment, the central analyzer can 
function as the master monitor that controls the func¬ 
tioning of the slave monitors. The slave monitors are 
totally passive and communicate with the master by 
means of a dedicated path. 

Here we describe the design and implementation of 
a hardware monitor that observes an experimental 
multiprocessor system at certain critical points and 
selectively gathers data required for performance 
analysis. 


The target environment 

The Testbed for Distributed Processing, or Ted, 
consists of Intel Corp.’s iSBC 8086 Single Board 
Computers (SBCs) organized into groups or clusters. 
(Figure 1 summarizes the environment.) Each cluster 
consists of several SBCs that communicate via a shared 
memory. Intercluster communication occurs through 
an Ethernet interface. Both loosely coupled and tightly 
coupled environments exist in this architecture. Within 
a cluster, processors are tightly coupled and processes 
are shared between processors. Across clusters, proces¬ 
sors are loosely coupled. 

Developing the distributed kernel for Ted required 
support for process migration. Processes must migrate 
from one cluster to another and be allocatable to any 
processor in the system. The decision of when and how 
to export a process or a group of processes rests on 
factors such as processor workload and node capabil¬ 
ity. A monitor that collects the dynamic system state 
and passes it onto the system scheduler furnishes this 
information. A monitor is also necessary for debugging 
a distributed system. 

The monitoring of Ted concerns both intercluster 
and intracluster traffic. Intercluster traffic is synony¬ 
mous with Ethernet traffic. Since an Ethernet controller 
board provides the interface, traffic data on Ethernet 
can be obtained from the controller or from tools such 
as packet analyzers. Our focus with Ted is therefore 
intracluster monitoring. 

Separate monitoring of each cluster implies that the 
monitor is distributed. In our implementation, each 
cluster contains one monitor that sits on its respective 
Multibus and overlooks the bus transactions. The 
monitor can be programmed to look for a particular set 
of events that trigger an associated action on the hard¬ 
ware monitor. 

The monitor gathers data that can either be stored for 
future analysis or used for real-time dynamic control of 
the system. The device can perform either passive or 
active monitoring. The present monitor is implemented 
in a passive manner. An external system performs the 
analysis of the gathered data. For simplicity, we can use 
any one of the SBCs in the cluster as the external 
system. We plan to eventually achieve active monitor¬ 
ing to fine-tune Ted’s performance. 


Design philosophy 

Figure 2 on the next page demonstrates the basic 
building blocks of a typical hardware monitor. Probes 
detect significant events. An event filter logically proc¬ 
esses and selects these signals. The processing element 
normally comprises counters that obtain timing infor¬ 
mation. Collected data is recorded on storage media in 
the form of a secondary storage device or RAM buffer. 
The control unit coordinates the entire operation. 
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Figure 2. Functional block diagram of a hardware 
monitor. 

The general design considerations for the Ted moni¬ 
tor include data reduction, programmability, communi¬ 
cation paths, and testability. 

Data reduction. Because the monitor observes ev¬ 
ery cycle of the bus, we could obtain a trace or profile 
of the Multibus (shown later in Figure 3) transactions. 
However, such data collection would demand exces¬ 
sive storage capacity. To reduce the amount of gener¬ 
ated data, the monitor records only specified events or 
signals on the bus. 

The monitor employs two basic approaches. It either 
counts the number of times a combination of events 
occurs, or it keeps track of the elapsed time between 
two events in terms of the number of clock cycles. Data 
collected in the count mode represents only an average 
value of the monitored event. In the time mode, the 
collected data helps reconstruct the sequence of events. 
We chose the time mode for this project. 

Programmability, A flexible monitor that hunts for 
different signal patterns must include a user interface 
that allows a template of events as instructions. For 
example, one can program several consecutive Multi¬ 
bus addresses in shared memory into a kernel to indi¬ 
cate process states in a multiprocessor. In this case, we 
monitored a range of memory addresses. The user 
interface supports a signal pattern, a “don’t-care” pat¬ 
tern (to select a range of memory addresses), and time 
duration. 

Communication paths. The monitor communicates 
with the external devices via special data paths. These 
paths exchange the input instructions as well as col¬ 
lected data. This amount of activity often loads the bus. 
Therefore, it is essential that data-path activity does not 
interfere with the system being monitored. 

An SBC could send instructions for monitor pro¬ 
gramming via the Multibus or a dedicated path. These 
messages take no more than a few bus cycles and can be 
removed in later calculations. Data communications 
from the monitor to the SBC, however, demand more 
bus usage. These communications occur via a dedi¬ 
cated path to avoid bus contention. 


Therefore we used two separate simplex paths: SBC- 
to-monitor communications use the Multibus, while 
monitor-to-SBC messages use the dedicated path. 

Testability. Testability—an essential part of any 
hardware design—is imperative to the determination of 
proper monitor functionality. If the activity on the bus 
is known, we can check the collected data for its valid¬ 
ity. But when multiple-processor bus contention oc¬ 
curs, we find it difficult to keep track of—or control- 
bus traffic. Hence we used a method of testing the 
monitor’s functionality without the presence of any 
external input. 


The Ted monitor 

Figure 3 is a block diagram of the Ted monitor. 

Probes. The probes of the monitor connect to the bus 
within the cluster. A standard Multibus edge connector 
serves this function. (Note that we had to implement an 
interface to the bus. 3 Also, we used a buffered receiver 
for all signals running from the bus to the monitor to 
further reduce the loading.) The bus-priority logic on 
the motherboard helps derive the signals that represent 
which SBC has access to the bus. One interrupt line of 
the bus is reserved for monitor usage. 

Event filter. An event filter looks for a set of events 
specified in terms of the signals that are being moni¬ 
tored. This filter can be programmed as a bit pattern for 
comparison with incoming signals. This unit consists 
of a set of registers (as shown in Figure 4) to store the 
bit pattern that represents the event, as well as a com¬ 
parator to match the signals on the bus. The bit pattern 
consists of the following signal lines: 

• 20-bit address lines, 

• 16-bit data lines, and 

• 4-bit read/write lines (I/O and memory read/write). 

Thus we implemented a pattern register (PR) of 40 bits. 
A comparator unit matches all or part of the signals that 
appear on the bus. These signals are latched in the bus 
register. We employed a don’t-care mask register to 
specify which bits are to be compared, using bit-wise 
OR logic. Both the pattern and the mask registers deter¬ 
mine these bits. The output of a comparator activates a 
set of counters. 

Counter unit. This unit implements the time mode 
of operation. The timing diagram in Figure 5 illustrates 
the mechanics of the counter unit. The signals match 
the pattern during clock cycles 1,3, 4, 7, 8, and 9. The 
output signal Equal is high for the clock cycles that 
have an active pattern on the bus and is low for the rest. 
The period of observation totals 12 clock cycles; the 
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Figure 3. Ted hardware monitor. 
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Figure 4. Registers in the event filter. 


Figure 5. Timing diagram of the time-mode operation. 


October 1989 47 































































Hardware monitor 



Figure 6. ASM chart for counter 0. 

pattern becomes active and idle in spurts. The time 
mode requires that the pattern be recorded as alterna¬ 
tive counts of active and idle times. For this example, 
the pattern is active at clock cycle 1, inactive at clock 
cycle 2, active for two cycles starting at clock cycle 3, 
inactive for two cycles starting at clock cycle 5, and so 
on, as indicated by the Equal signal in the diagram. 

One counter measures the amount of inactive time 
and the other measures active time. The control unit 
enables the counters with the help of the State Change 
signal, which is derived by an exclusive OR of the 
present and previous Equal signals. Using two counters 
overlaps the counting and recording processes. Once a 
state change is detected, the next counter activates. At 
the same time, the system unloads and clears the first 
counter and keeps it ready for the next state change. 

The system reserves one bit of the data to ascertain 
whether it pertains to active or inactive time. The 
monitor controls the two counters as a synchronous 
pair. Figure 6 shows the control for the inactive counter 
0 in the form of an Algorithmic State Machine chart. 

Buffer unit. We chose a double-buffering scheme 
that allows one buffer to collect data and the other to 
empty its data into the SBC to maximize concurrency. 
The estimation of buffer size constitutes a critical 
design parameter. The buffer should be large enough to 
collect data over a given period of time without over¬ 
flow and small enough to be emptied within the same 
time span. 

Applying a queueing analysis normally carries out 
the estimation. Assume that the data arrives (at the 


buffer) at some rate R for servicing (it is transferred to 
the SBC) and that the service rate is the data-output rate 
T. Since the periodically serviced buffer can overflow 
if it is too small, our objective was to determine a buffer 
size that would eliminate such an overflow or loss of in¬ 
formation. In the following, we determine the input and 
output rates and the distribution of the data. 

The rate at which the system writes to the buffer 
depends upon the rate at which the Equal signal changes 
from state to state. The higher the frequency of change, 
the higher the input rate to the buffer. The worst case 
condition corresponds to a state change during every 
clock cycle. With a clock frequency of 10 megahertz, 
the input rate is 1 occurrence per 100 nanoseconds. In 
addition, we assume that this input rate follows a Pois¬ 
son distribution. 4 

Output rate. The nature of the underlying communi¬ 
cation path largely determines the output rate for 
emptying the buffer. We assume the path to be simplex, 
that is, the data transfers in only one direction along this 
dedicated path. The parallel port of the SBC serves as 
the interface. The time required to transfer the data 
from this interface to the local memory of the SBC can 
be determined from the SBC's constituents. 5 This pro¬ 
cess involves an 8225A programmable peripheral in¬ 
terface and an 8259A priority-interrupt controller. We 
calculated the time required to read a byte of data as 
3.225 microseconds. 6 We therefore estimate 4|is as the 
nominal service time required to transfer a unit of data. 

Buffer size estimate. Let N be the buffer size, R the 
mean input rate, and T the transfer time per byte. The 
SBC takes NT time to flush the buffer. Given two 
buffers and an input rate that follows Poisson statistics, 
we define P{N) as the probability that the second buffer 
has N arrivals in time T. That is, 

P(N) = [(/?7’)' v exp“' i7 j|/W! 

To eliminate overflow, we need to ensure that the 
probability of N or more arrivals in the given time is at 
the minimum, say, less than 0.15 percent. That is, 

P(N) + P(N+ 1) + P(N + 2) + .. . < 0.0015. 

Determining buffer size N is based on the assumption 
that the probability of the second buffer overflowing is 
less than the chosen value, given that it takes NT time 
to flush the first buffer. Using T = 4 ps and R = 1/(100 
ns), a good value of N turns out to be 60. 

Figure 7 provides a few details of the actual implem¬ 
entation. The buffer has four distinct components: the 
data registers, RAMs, address registers (ARs), and the 
line drivers. An AR starts at zero and counts positively 
every time a word has to be written. When the count 
reaches its maximum value, the buffer fills, which ini¬ 
tiates a corresponding signal to the control unit. An AR 
is in the up count mode during this operation. 

Once the data-gathering operation completes—or 
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Figure 7. Buffer unit. 

the buffer fills—the system maintains the value of the AR until 
a data-send operation begins. When the data is ready to be sent 
to the SBC, the control unit activates the read line of the 
corresponding RAM buffer and enables the line driver. The AR 
switches to the down mode. The unit routes the data read from 
the RAM to the parallel port of the SBC via the line driver. It 
also sends a strobe signal to the parallel port of the SBC to latch 
the data onto the port. The SBC acknowledges with the Input 
Buffer Full (IBF) handshake signal. When the monitor recog¬ 


nizes (from IBF) that the latched data has been 
cleared, it decrements the corresponding AR 
and reads the next word. This process continues 
until the AR reaches a value of zero, which 
indicates that the buffer is empty. The control 
unit then sets the appropriate status bits. 


Control unit. This unit (shown earlier in Fig¬ 
ure 3) coordinates the actions of the other units. 
It receives instructions from an external source, 
interprets them, and initiates actions on the 
monitor board. The control unit responds to the 
following instructions: 

• Reset. The control unit initializes all units to 
a known state. 

• Loading registers. The system loads pro¬ 
grammed data into the pattern register, the 
don't-care mask register, and the timer (which 
times the duration of monitoring). 

• Start. Once the SBC has loaded the regis¬ 
ters, it issues a Start command to the monitor. 
The control unit then enables the event filter, 
edge detector, counter, and buffer unit as re¬ 
quired. Once the specified time duration elapses 
or the system identifies an error such as buffer 
overflow, the control unit stops all units and 
raises an interrupt line to inform the SBC. 

• Indefinite time. If the SBC requires that the 
monitoring continue for an indefinite period of 
time, it does not write into the timer. Instead, it 
issues the Indefinite Time command to the 
monitor in place of the Start command. The 
control unit then activates all units until an error 
arises or it receives a Halt command from the 
SBC. 

• Halt. This instruction overrides any previ¬ 
ous command, stops system monitoring, and 
also brings the monitor to an input-accepting 
state. 

• Read status. An SBC uses this instruction to 
determine the status of the monitor by reading 
the contents of the status register. 

• Send data. Once the data is collected in the 
buffer—or the buffer has become full—it needs 
to be emptied. The SBC issues a Send Data com¬ 
mand to the monitor to begin this process. Data 
transfer can occur in parallel with the rest of the 
monitor operations. 

• Self test. The monitor can test itself (see the 
section on this unit). 

The state diagram in Figure 8 on the next page 
depicts the overall control-unit operations. 

The control unit uses erasable programmable 
read-only memory. Four EPROMs implement a 
horizontal microcode to generate all control 
signals. The EPROMs function differentially to 
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Figure 8. State diagram of the control unit. 


• service data-send operations, 

• process inputs from the SBC, 

• collect data into Buffer 0, and 

• collect data into Buffer 1. 

Interference. Removing the effects of interference 
from the monitor or the data is an important control 
function. An active monitor generates interference. At 
that time, the SBC sends instructions to the monitor via 
the Multibus. Because this signal is not part of the host 
system and is introduced only when the monitor is 
present, we do not take it into account. Disabling all 
units of the monitor when the SBC tries to access it 
removes the interference. 


Self-test unit. This unit essentially simulates the 
activity on the bus. It loads a pattern into the registers 
and starts the monitor. It then generates patterns and 
applies them to the bus register. The event filter com¬ 
pares and collects data as in a normal operation. 

At the end of the specified time, the self-test unit 
compares the data collected by the monitor in the 
buffers with the prestored data. If the two match byte- 
for-byte, the monitor functions properly. If the match is 
defective, the monitor could enter an internal, infinite 
loop. A watchdog timer 7 that raises an interrupt after a 
certain period of time can force the monitor out of the 
loop. 
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Applications 

Monitors are essential tools for performance evalu¬ 
ation and management in the field of distributed proc¬ 
essing. Because monitors are functionally multidimen¬ 
sional, we see applications in several of the following 
areas. 

Event recorder. The recording of events—the pri¬ 
mary role of a monitor—allows event simulation at any 
arbitrary point of time. Specifically, the time mode of 
operation provides straightforward reconstruction of 
events. Event recording and playback are also very 
conducive to catastrophe studies (detection as well as 
prevention). 

Debugging tool. Debugging a multiprocessor sys¬ 
tem is a complex process. A monitor can assist greatly 
by isolating problem areas as it seeks specific faults or 
patterns. 

Fault-tolerance aid. One can design a monitor to 
detect failures in nodes as well as in communication 
links. Our system functions as both a cluster monitor 
and a network monitor in Ted. A control processor 
subsequently takes corrective actions based on the 
monitoring result. A playback of events also allows 
recovery or restarting from the point of failure. 

Universal monitor. We designed and tested the 
present monitor for the Multibus architecture. How¬ 
ever, it is suitable for any arbitrary bus, with a few 
modifications. The captured data has to be interpreted 
for each type of bus architecture. Once this is achieved, 
one can connect clusters of different architectures. The 
universal monitor can evaluate, interpret, and manage 
the cluster. Thus one can evaluate different bus features 
with the same instrument. 


W e have designed and implemented a hardware 
monitor to handle the monitoring activities 
within a cluster in the Ted system. By using 
specified patterns and don’t-care masks, the system can 
detect accesses to selected data, addresses, or blocks of 
addresses. This function helps monitor events such as 
the access or usage of a memory location or a group of 
mailbox addresses. It also determines the amount of 
time consumed by the performance of specific opera¬ 
tions. 

We plan to accomplish active monitoring in which an 
external computer system connected to an Ethernet 
network controls a Ted system, si 
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T he PI073 Medical Information Bus (MIB) family of three proposed 
IEEE standards provide vendor-independent interconnection and 
interoperability of medical devices and computer systems. The MIB 
uses a layered network architecture that is compatible with International 
Standards Organization/Open Systems Interconnection (ISO/OSI) specifi¬ 
cations. 1 The proposed standards primarily focus on the acute care environ¬ 
ment at the patient’s bedside, particularly when this environment is in the 
intensive care area of the hospital. However, the MIB is suitable for use in 
other areas such as the operating room or the general ward. The bus can also 
interface to any general clinical equipment for simple data communication 
and control purposes. 

The proposed standards provide for a specialized local area network 
between one or more groupings of medical instruments and a host computer 
system. The host system can operate from the bedside or from a remotely 
located area, where it can be accessed via a conventional LAN. 

The MIB differs from a conventional LAN due to the unique requirements 
of the medical environment. Frequent changes in the number and types of 
devices at the bedside require automatic reconfiguration of the network as 
devices are connected or disconnected. Medical users that participated in 
our project required that the devices and systems interface with an absolute 
minimum of user interaction. The MIB network automatically detects 
device connections, establishes communications, and identifies devices. 
The user merely plugs in a data cable and switches on the device. 

P1073 also specifies the Medical Device Data Language. The MDDL 
communicates measurement data, status, and control information between 
the host system and the medical instruments. 2 Clinical data can involve 
much more than simple numeric values, particularly if the device is aware 
of parameters concerning the context of the measurement. These parameters 
can include measurement units, sites, and methods, as well as the time the 
measurement took place. 


An earlier version of this article appeared in the 1989 Proceedings of the 8th Annual 
International Phoenix Conference on Computers and Communications. 
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Figure 1. The MIB network nodes. 


Intelligent monitoring devices with numerous modes 
of operation and alarms communicate not only the 
clinical data collected by the device but also status data 
concerning the current state of the device itself. Thera¬ 
peutic instruments such as ventilators and closed-loop 
infusion systems require that device-control informa¬ 
tion be included in the list of communications require¬ 
ments. 

The MDDL provides operations for exchanging 
clinical data in context, including both the measured 
values and those attributes that describe the 
measurements known to the device. The MDDL also 
provides mechanisms to handle the commnications 
necessary for monitoring device status and handling 
device-control information. Altogether, the MDDL 
provides a comprehensive, relatively complex set of 
device-related measurement operations necessary for 
host-system and intelligent-device functioning. 
However, one can also streamline the MDDL into a 
very compact form for efficient implementation on 
simple measurement devices that supply only numeric 
data without any context. 2 


Nodes on the network 

The MIB requires three different types of logic nodes 
to form an operational network (Figure 1). A medical 
device interfaces to the MIB through a device-commu¬ 
nications controller. This DCC connects to a bedside- 
communications controller to form a local subnetwork 
at that location. The BCC functions as a concentrator or 
multiplexer between the bedside medical device and 


the host-interface node. 

The DCC contains hardware for a plug-compatible 
interface and software for reliable communications 
between the medical device and the MIB. The BCC 
contains hardware—and the minimum control soft¬ 
ware—for reliable data transfer between the DCC and 
the host interface. The BCC automatically detects 
device connections and disconnections at the bedside. 
The host-interface node is a software interface between 
the MIB and the host computer that is responsible for 
collecting data from the bedside. 

The MIB's flexibility allows either workstations or 
monitors at the bedside to serve as hosts. One or several 
MIB nodes can reside in the same physical instrument 
or system. Intelligent monitoring instruments can per¬ 
form BCC functions, or workstations at the bedside can 
perform both host and BCC functions. Or, with an 
enhanced BCC, a conventional LAN connection can 
access remotely located minicomputer hosts. Just as 
many DCCs can connect to one BCC, multiple BCCs 
can communicate with a single host system that serves 
multiple bedsides (see Figure 2 on the next page). 

One can build DCCs to handle multiple devices such 
as multichamber infusion pumps, or devices with vary¬ 
ing parameters such as monitoring systems with plug¬ 
in modules. Generic add-on DCCs can interface exist¬ 
ing simple measurement devices to the P1073 network. 

PI073 describes a method for measuring the data 
load at a bedside. This procedure aids proper configu¬ 
ration of an MIB network for a specific installation. We 
measure the amount of data that a particular device 
places on the network in MIB loading units. We de¬ 
scribe devices and their DCCs by the number of loading 
units they need for operation on the network. Similarly, 
we rate BCCs and hosts by their processing capacity in 
terms of loading units. We can size and produce BCCs 
in different capacities according to the load require¬ 
ments of different bedside environments (operating 
rooms, intensive care units, or the general ward). The 
hospital’s biomedical or clinical engineers determine 
the number of loading units needed at a particular 
bedside, along with the number of MIB outlets. 


Family of proposed standards 

Here we present the general specifications of each 
document in the P1073 group. A more technical de¬ 
scription follows in the next section. 

P1073.1, Medical Information Bus: Network 
Architecture and Application Interface. This docu¬ 
ment describes the upper three OSI layers—applica¬ 
tion, presentation, and session—as well as the overall 
network architecture. The application layer specifies 
the use of ISO protocols for remote-operations and 
association-control service elements (ROSEs 3 - 4 and 
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Figure 2. The MIB interbed and intrabed networks. (Courtesy of Tim Kelly, C.R. Bard, Inc.) 


ACSEs 5,6 ), as well as medical device application serv¬ 
ice elements (MDASEs). (See Figure 3 and Table 1.) 

P1073.2, Medical Information Bus: Bedside 
Communications Subnetwork. This proposal de¬ 
scribes the subnetwork that provides the transport serv¬ 
ice between the DCC and the BCC. This subnetwork 
consists of the lower four OSI layers: transport, net¬ 
work, data link, and physical. The inactive transport 
and network layers do not allow packet assembly or 
multiplexing. The data-link layer uses a specified sub¬ 
set of high-level, data-link control (HDLC 7 ) proce¬ 
dures and a strict polling protocol to provide reliable 
link service. (See Figure 3 and Table 1.) 

P1073.3, Medical Information Bus: Gateway 
Function Between a P1073.2-compliant Network 
and an ISO 8072/8073-compliant Network. This 


document describes the access function between a BCC 
and a host that is located on a conventional LAN (see 
Figure 4 and Table 2 on page 56). The specifications of 
this proposed standard are required only in the case of 
a remote host. A local host network need only use 
P1073.1 and P1073.2 specifications as shown in Figure 
3. We assume that connection-oriented transport serv¬ 
ices are available for a host on a conventional LAN. 
ISO 8072 8 and ISO 8073 9 describe the services and 
protocol for reliable end-to-end communications at the 
transport layer. 

P1073.3 provides for a gateway from a P1073.2 
subnetwork to a LAN. Current efforts (still in an early 
stage of development) focus on the issues that involve 
internetworking MIB, as we discuss later. These issues 
also include supporting multiple, simultaneous conver¬ 
sations between devices and hosts (multiple-associa¬ 
tion or application-level gateways). 
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Figure 3. The MIB protocol stacks. 



Table 1. 

Summary of MIB protocol stacks. 

Layer 

Name 

Description 

P1073.1 (upper layers) 


7 

ACSE/ROSE 

The kernel subset of ISO 8649/8650 (ACSE) and ISO 9072 
(ROSE) application layer services and MDASE 

6 

MIB presentation 

The kernel subset of ISO 8824/8825 presentation-layer 
services 

5 

MIB session 

ISO 8326/8327 session-layer kernel subset 

P1073.2 (lower layers) 


4 

MIB inactive transport (MIT) 

MIT service primitives and parameters identical to those for 

ISO. Most parameters defaulted in the MIT protocol. 

3 

MIB inactive network (MIN) 

A functionally inactive layer, with a control header specified 
for use in future revisions. 

2 

MIB data link 

A specified subset of HDLC, using two-way alternate, 
normal-response mode, with the BCC as the primary 
station and the DCCs as the secondary stations. 

1 

EIA-485 

Data transfer using Electronics Industry Association EIA-485 
interface 10 at a signaling rate of 375 Kbps with NRZI 
encoding. BCC provides 12VDC ± 2V current limited to 

250 mA of power to DCC. Specifies a unique 6-pin 
connector and a shielded cable with three twisted-wire 
pairs. 
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Figure 4. The MIB network with a remote host. 




Table 2. 

Summary of LAN protocol stacks. 

Layer 

Name 

Description 

4 

TP-4 

Transport protocol class that provides reliable data transfer over an assumably 
unreliable network (one that does not guarantee a delivery) such as IP (see 
below). 

4 

TP-1 

Transport protocol class that provides minimal data-transfer services over an 
assumably reliable network, such as X.25 (see below). 

3 

IP 

Internet protocol that provides basic network-layer, data-transfer services 

(including services between different networks) without guaranteed delivery. 

3 

X.25 PLP 

X.25 packet-level protocol that provides guaranteed network services at the layer 
data-transfer level. Subject to resets that must be handled by the transport 
layer. 

2 

802.2 LLC 

ANSI/IEEE Standard 802.2 (ISO/DIS 8802-2) 11 on logical link control that 
provides both connection-oriented and connectionless data-link procedures 
over a variety of different transmission media. 

1 

802.X 

Any of the family of ANSI/IEEE/ISO 802 1214 transmission media that specify 
media-access control protocols and physical layer encoding and transmission 
characteristics. Includes Carrier Sense Multiple Access with Collision 

Detection (CSMA/CD), token-ring, token-bus, and fiber-optic interfaces for 
distributed data. 


Technical information 

Here we discuss the salient technical features of each 
proposed standard at its current stage of development. 

P1073.1. As mentioned, this proposal discusses the 
overall MIB network and layers 5, 6, and 7. It differen¬ 
tiates between required and optional features, identi¬ 


fies unique characteristics and requirements of the net¬ 
work, and discusses functional requirements of the user 
interface. 

We are currently defining MDDL for possible use in 
a MDASE in the application layer (see Figure 5). A 
MDASE can make use of the ROSEs and ACSEs dis¬ 
cussed in the ISO Draft International Standards 9072- 
1 and 9072-2 3,4 (for model, notation, and service defini- 
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Figure 5. The MIB upper layer network architecture. 

tions for remote operations) and ISO Standards 8649/ 
50 5,6 (for the service definitions and protocol specifica¬ 
tions of ACSEs). MDDL previously had solely con¬ 
sisted of a structured set of ROSE operations, with 
parameters, attributes, and data types written in the 
Abstract Syntax Notation (ASN.l) specification. 1516 

MDDL has recently been redesigned as a flexible, 
extensible messaging system. It first specifies the rules 
for forming (or parsing) the messages. These rules in¬ 
clude specifications of required and optional fields. 
Hence, MDDL specifies a structure or template for the 
messages. 

Second, MDDL rarely restricts the actual contents or 
meaning for a given field in the message. However, 
MDDL medical-device messaging specifications de¬ 
fine the required fields and default values for each 
recognized class of medical device (monitors, ventila¬ 
tors, infusion pumps, etc.). Default values specify es¬ 
tablished terminology and encoding schemes, where 
these exist. For example, Systematized Nomenclature 
of Medicine, or Snomed, codes 17 can serve as the 
default specifications for the anatomical location of a 
measurement. International Medical Device Codes can 


provide a system for identifying the devices. 18 Systeme 
International units can serve as the defaults for units of 
measurement. 

PI073.1 specifies the kernel subsets of several ISO 
standards for the presentation and session layers. (ISO 
8824/25 15,16 concern ASN.l and its basic encoding 
rules, while ISO 8326/27 1920 specify basic connec- 
tions-oriented service definitions and protocols for the 
session layer.) These layers are required for compli¬ 
ance on the DCC and the host-interface nodes of the 
MIB network. 

Other specific user-interface requirements include 
positive visual indication of a connection to the MIB 
network and an operator button on the device to signal 
intentional disconnections. A two-digit display on 
certain DCCs allows operators to discriminate between 
multiple devices. Operators can then distinguish be¬ 
tween multiple infusion pumps for fluid-identification 
purposes, for example. 

P1073.2. This proposal specifies the lower level 
architecture for the MIB network. In addition to the 
services and protocols for layers 1 through 4, the docu- 
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merit specifies the hardware, cabling, and connectors 
necessary for plug compatibility between DCCs and 
BCCs. 

We optimized the P1073.2 subnetwork for use in the 
medical environment. The proposal focuses on provid¬ 
ing fast, reliable communications while also detecting 
device connections/disconnections. The bedside 
subnetwork consists of a star point-to-point physical 
topology (Figure 2). As a result, network addresses are 
location-dependent within a P1073.2 subnetwork. The 
BCC associates every connected device with a particu¬ 
lar bedside (patient). During connection, the device 
specifies quality-of-service (QOS) parameters for only 
the maximum and minimum throughputs required in 
each direction. These parameters translate into MIB 
loading units during the connection process, as de¬ 
scribed later. The subnetwork defaults all other QOS 
parameters. 

Transport layer. P1073.2 provides transport-layer 
services between a DCC and a BCC. The MIB inactive- 
transport (MIT) layer service primitives appear similar 
to ISO 8072/8073 transport services, but do not strictly 
comply with ISO 8072/8073. MIT does not support 
packet assembly and disassembly and contains a sim¬ 
plified set of active parameters in its connection-serv¬ 
ice primitives due to null address fields and other 
defaulted QOS parameters. A single, unconfirmed, 
data-transmission-request primitive and an indication 
primitive exist. Disconnection service, which is unac¬ 
knowledged by the bus, destroys any queued packets. 
Only a DCC can initiate connection requests. Both 
DCCs and BCCs can initiate data-transfer and discon¬ 
nection services. 

Network layer. Although the MIB inactive-network 
(MIN) layer is functionally inactive, it does have serv¬ 
ice primitives for connection, data transfer, and discon¬ 
nection. The layer accepts service data units from the 
transport layer and adds a 1-byte, protocol-control 
information (PCI) field (as does the MIB transport 
layer). The PCI’s presence in the network and transport 
layers allows backward compatibility in the case of a 
new MIB lower layer protocol set defined in the future 
with mulitplexing or packet-assembly/disassembly 
functions. 

Data-link layer. This layer provides service primi¬ 
tives for connection-oriented, point-to-point data trans¬ 
fer. The protocol uses an optimal subset of the two-way 
alternate (TWA) normal-response mode (NRM) of 
HDLC elements of procedure ISO 4335. 7,8 (This stan¬ 
dard describes high-level data-link control procedures 
and consolidation of their elements in information 
processing systems.) The BCC functions as the primary 
station, and all DCCs function as secondary stations. 
The BCC employs a polling protocol to detect device 
connections/disconnections. 

Connection requests are negotiated through the ex¬ 
change of control packets. 21 Maximum-information 


field length is negotiated through the use of MIB load¬ 
ing units. The BCC polls empty ports by sending con¬ 
trol-packet frames that offer the maximum loading 
units available at that port. The connection request at 
the DCC specifies the loading units needed by the 
device. If the units offered are adequate, the DCC sends 
a control-packet frame in response to the control-packet 
poll. During control-packet frame exchange, the sys¬ 
tem sets the transmission window size to 1 and uses a 
sequence-number modulus of 8 for each end of the 
connection. 

Data-transfer services use information and supervi¬ 
sory frames. Either end can issue a disconnection re¬ 
quest, which is destructive to any queued packets and is 
unacknowledged. The poll/final bit functions as the 
token in the half-duplex transmissions. 

The P1073.2 proposed standard specifies strict re¬ 
sponse times and transmission-retry limits. Failure to 
respond to a BCC poll—or the absence of a poll at the 
DCC within a 1 -second period—breaks the connection. 
The detecting station must return the port to the discon¬ 
nected status. 

Physical layer. This layer uses EIA-485 specifica¬ 
tions with nonreturn to zero inverted (NRZI) encoding 
at 375 Kbps. The proposal specifies a unique 6-pin 
connector and a shielded 6-wire cable. One wire pair 
provides half-duplex data transmission. Another pair 
supplies 12VDC ± 2V current limited to 250 milliam- 
peres for optional use by the DCC communications 
hardware. An optional encoding mechanism on the 
third wire pair provides for time alignment between 
instruments. 


P1073.3. For local hosts—like an intelligent work¬ 
station at the bedside—the PI073.1 host interface uses 
the transport services provided by the PI073.2 subnet¬ 
work. A remote host that is accessible through a con¬ 
ventional LAN uses the transport services of that LAN. 
Before remote hosts can communicate with a P1073 
network, at least two major issues need resolution. One 
involves the internetworking between P1073 and an¬ 
other network. The second involves whether to support 
only single associations (one-to-one) or multiple asso¬ 
ciations (one-to-many) between a single device and its 
host(s). 

The original P1073.3 project authorization addresses 
only the first issue, and then only in a rigidly specified 
manner. The following paragraphs discuss this version 
of PI073.3. However, note that several other architec¬ 
tural solutions are possible (bridges and routers, for 
example). The multiple-access problem, having been 
recognized as an issue, awaits completion of P1073.1, 
which addresses the single-association case. 

P1073.3, Version 1 . Communicating with a remote 
host requires that an enhanced BCC provide a relay 
function between the transport services of the PI073.2 
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bedside subnetwork and those of the remote host. In 
addition to the usual P1073.2 functions and hardware, 
this enhanced BCC contains hardware and software 
associated with the LAN located between it and the 
remote host. The BCC must also contain control soft¬ 
ware for mapping the services and parameters between 
the two networks. As currently authorized, P1073.3 
specifies the relay function between the transport serv¬ 
ices of a P1073.2 network and an ISO 8072/73- 
compliant network. 8 ’ 9 

In the latter network, the calling transport service- 
access point (TSAP) is the DCC TSAP, and the called 
TSAP is the host TSAP. The host TSAP is stored in 
nonvolatile memory within the BCC during initializa¬ 
tion. The transport selector address field of the DCC 
TSAP is the physical port number of the device connec¬ 
tion inside the BCC. 

When a P1073.3 BCC receives a connection indica¬ 
tion from the P1073.2 transport layer, the BCC must 
initiate a transport-connection request on the ISO 8073 
network. When it receives a connection confirmation 
from that network, the BCC forwards a connection 
response to the DCC on the PI073.2 subnetwork. The 
QOS parameters passed between the two networks 
specify the level of needed service. P1073.2 QOS 
parameters contain almost all of the default values 
needed to tightly specify the required service. The BCC 
must map P1073.2 default values onto ISO 8073 para¬ 
meters, and vice versa, in a manner that is consistent 
with its knowledge of the requirements of the two 
networks. 

After the connection is established between the two 
networks, the BCC relays data indications received 
from either network as data requests to the other net¬ 
work. Disconnection indications from either network 
become a disconnection request to the other network. 

The current first rough draft of P1073.3 contains 
several options for BCC initialization, including opera¬ 
tor entry, host-initiated connection to the BCC, and 
BCC broadcast messages to find a host. 

Status of P1073 

Although early prototypes have existed for several 
years, 22 the design of the MIB network has evolved and 
changed considerably due to input from various sectors 
of the healthcare professions and industries. 

As of this writing (August 1989), the working group 
is performing the final editing of the P1073.2 draft 
document. Different parties have independently built at 
least two PI073.2 subnetwork prototypes, which were 
demonstrated as stand-alone systems. The last remain¬ 
ing step is demonstration of the interoperability of the 
two prototypes. We expect the final draft document to 
be passed to the IEEE for formal balloting and public 
comments before the end of this year. 

The group is currently revising the draft of the 
P1073.1 document to incorporate reference models for 


the network-architecture, virtual-device, MDDL-lan- 
guage, and MDDL-messaging specifications—along 
with the services and protocols for the upper three 
layers. We expect a substantive draft of P1073.1 by the 
first quarter of 1990. We have already demonstrated 
one significant early prototype that contains MDDL 
and the P1073.1 upper layers. We plan to demonstrate 
additional prototyping of the upper layers later this 
year. 

P1073.3 has proceeded through the rough-draft 
stage. The internetworking and gateway-access issues 
previously mentioned are under study. However, com¬ 
mittee efforts currently focus on the completion of 
P1073.1 and P1073.2. (Although P1073.3 is optional, 
PI073.1 and .2 are mandatory for compliance.) 

T he professional community has expressed consid¬ 
erable interest in the MIB, both locally and inter¬ 
nationally. We plan to hold a P1073 working 
group meeting at a major international healthcare or¬ 
ganization conference in 1990. We have established 
liaisons with other national and international healthcare 
standards groups. 

The refinement of MDDL rests on additional public 
input. Initial groups of classes of devices and corre¬ 
sponding message sets are being defined for the first 
version of MIB. More focus groups would provide 
further definition and refinement of additional device 
classes and message sets. Like the current PI073 work¬ 
ing group, these focus groups would consist of indi¬ 
viduals from the healthcare and computing industries, 
clinical users, the biomedical and clinical engineering 
communities, and other interested parties. 

Through a standard data interface and communica¬ 
tions network, the Medical Information Bus could fi¬ 
nally open the door to large-scale, automated collection 
and integration of clinical data into the electronic 
medical record, sfe 
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The Micon System 
Computer Design 



o 

emiconductor technology provides tremendous opportunities for 
high processing power and reliable, low-cost' computers in the 
office or laboratory. Many of these machines, ranging from PCs to 
superminicomputers, use commercially available microprocessor family 
components. The high-end microprocessors available today commonly sup¬ 
port virtual memory and main memory caches. A full array of high- 
performance dedicated processors (graphics controllers and numerical 
coprocessors) and communication components complement these devices. 
One can construct a very sophisticated machine almost entirely from off- 
the-shelf components. Semiconductor designers have just begun to tap a 
well of possibilities that promises even more performance and functions in 
future chips. 

All of this advancement, however, extracts a price. Sophisticated compo¬ 
nents require sophisticated hardware designers. High-performance proces¬ 
sors that commonly run in the 16- to 25-megahertz range force designers to 
become experts in high-speed logic design and computer architecture. 

The days of simple designs that consisted of a processor and a serial I/O 
interface are over. Today’s smallest computer system contains—at a mini¬ 
mum—disks, buses, and graphics-terminal interfaces. Workstations must 
use a minimum of three levels of memory hierarchy to achieve maximum 
performance. In addition, such traditional extras as networking and graph¬ 
ics are standard equipment. All of this integration places a strain on 
hardware designers who are trying to keep abreast of a rapidly evolving 
technology. 

In the marketplace—where product life cycles are measured in months— 
the competition continues to become more intense. Three forces impact the 
hardware designer: a reduced design time, the ceaseless demand for in¬ 
creased performance at a lower price, and a constantly evolving technology. 
It appears unlikely that these forces will abate of themselves. Therefore, 
designers need some assistance in the form of CAD tools. We consequently 
designed the microprocessor “configurer” system, or Micon, to provide 
support for computer hardware designers. 

The objective of Micon is to reduce the time required to construct a 
hardware system. The approach to achieving this objective rests on two 
points: 

An earlier version of this article appeared in Proceedings of the 26th ACM I IEEE Design 
Automation Conference, © 1989 ACM 0-89791-310-8/89/0006/0135 $1.50. 
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• capturing and disseminating design expertise in a 
format that actively assists designers, and 

• providing a tool environment that supports all as¬ 
pects of computer design. 

Numerous benefits accrue from making design ex¬ 
pertise commonly available in a computer program. 
The system reduces the learning curve that faces de¬ 
signers when a new component is introduced. It also 
expands the number of new components that a designer 
can consider. Other benefits include fewer design er¬ 
rors and less time spent in trial-and-error learning. 
These factors allow the designer more time to create al¬ 
ternative designs, or just to create a working design 
more quickly. 

The creation of computer hardware includes not only 
logic design and physical design but also reliability 
analysis. A design environment that supports multiple 
expert systems allows the designer to single-handedly 
address the entire spectrum of design activities. Such 
an environment handles data consistency issues more 
efficiently and results in fewer design errors. 

Micon employs a set of complementary technolo¬ 
gies. Artificial intelligence provides design synthesis 
and the acquisition of design expertise. Databases 
provide consistent views of data to all tools. Network¬ 
ing allows the system to efficiently share common 
resources, such as the database, across many users of 
the system. Here we describe the architecture of Micon, 
provide examples of its use, and describe a set of 
experiments to test its viability. We begin with com¬ 
parisons of Micon to other related design systems. 


Comparisons 

Design synthesis is the process of producing a design 
that meets some set of specifications. In general, the 
specifications are abstract relative to the details needed 
to build the design. The task of a synthesis system is to 
provide the missing information. The mechanisms used 
to implement the synthesis system vary as widely as the 
types of synthesis tasks. Here we discuss related syn¬ 
thesis work in digital domains, but work is also pro¬ 
gressing in other domains such as analog circuit design 1 
and mechanical engineering. 2 

Digital system synthesis is an area of particular 
interest in the CAD field. Interest has shifted from 
graphics-oriented, schematic-capture systems to fully 
automatic systems that can produce sophisticated, fully 
operational designs. 

Much of the recent work in digital system synthesis 
centers around the development of application-specific 
integrated circuits (ASICs). (We use the term applica¬ 
tion-specific very loosely here to connote systems that 
range from simple controllers to special-purpose 
microprocessors.) 

In a broad sense, two approaches exist in ASIC 


design, depending upon the type and abstraction level 
of the input and the resulting design. The first approach 
proceeds from a high-level behavioral description and 
ends with an IC. The types of designs produced typi¬ 
cally include microprocessors and minicomputer 
CPUs. The System Architect’s Workbench (SAW) 
project, 3 which evolved from the Carnegie Mellon 
University Design Automation (CMU-DA) system 
project, 4 synthesizes digital systems from a high-level 
behavioral language called Instruction Set Processor 
Specification (ISPS). 5 SAW’s objective is to produce a 
very large scale integration (VLSI) IC that implements 
the machine described by the ISPS input program. 

The synthesis process consists of transforming the 
behavioral description into a technology-independent 
dataflow representation called a value trace graph. The 
graph is then subjected to a set of complex operations 
that result in a data-path and controller design repre¬ 
sented as a set of register-transfer-level (RTL) compo¬ 
nents. The RTL representation binds to modules in an 
implementation technology, such as standard cell 
libraries. 

The second broad class of ASIC synthesis systems 
takes a lower level description of a system as input for 
implementation. The descriptions vary, but typically 
spring from any of the following: 

• an RTL hardware description language, 

• a set of Boolean equations, or 

• state tables. 

These tools produce sophisticated designs that are 
smaller than the systems previously discussed. Ex¬ 
amples of these designs, which are used as building 
blocks of larger digital machines, include program¬ 
mable logic arrays, state machines, and blocks of 
combinational logic. Logic synthesizers, which are 
typical of this class of tools, produce ASICs by using a 
set of predefined cells (for instance, standard cells or 
gate arrays). IBM has put forth a long-standing, suc¬ 
cessful effort in this area with the development of a 
production quality system called the Logic Synthesis 
System. 6 A number of other systems employ a similar 
paradigm to produce combinational logic and/or state 
machines. 7 

Designers are applying artificial intelligence to a 
wide range of synthesis problems. The appeal of AI is 
that it uses nonmathematically based representations 
(an example set of representations for digital systems is 
found in Bobrow et al. 8 ). AI can also use domain 
knowledge to reduce the search inherent in many syn¬ 
thesis problems. The success of the Xcon/Rl project 9 
illustrated the potential of AI technology for synthesiz¬ 
ing problems. The Design Automation Assistant sys¬ 
tem, 10 part of the CMU-DA system project, used do¬ 
main knowledge to aid in the generation of an IBM 
System 370 design. The VLSI Expert Editor system 11 is 
an interactive knowledge-based editor that assists a 
user in traversing a hierarchy of functional blocks to 
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Figure 1. The Micon system. 

create an N-channel, metal-oxide semiconductor im¬ 
plementation of a design. 

A survey of the literature reveals that the design of 
pieces of computers (for instance, CPUs)—rather than 
complete computer systems—has driven most of the 
current synthesis research. However, as design tools 
become more proficient, system-integration issues will 
gain prominence. The Micon project concerns the 
synthesis of systems through the configuration of exist¬ 
ing components. This approach emphasizes matching 
the interfaces between components (and in this way is 
similar to the Critter system, 12 although the intentions 
and implementations of the systems are very different). 

The Micon project originated with MO (a recently 
dubbed name), 13 a knowledge-based system that as¬ 
sessed the feasibility of automated single-board com¬ 
puter synthesis. This assessment involved commer¬ 
cially available microprocessors. MO embodied, in a 
more primitive form, many of the concepts used in the 
design of our final Ml synthesizing tool. Several mi¬ 
croprocessor families were hard-wired into the system. 
MO produced a number of designs. One of them, based 
on the Z80, resulted in a working single-board com¬ 
puter. MO also originated the idea of templates for rep¬ 
resenting structural information for synthesis. The 
system, however, had a number of drawbacks that 
limited its general applicability (see Gupta 14 for more 
details). These drawbacks led to the development of 
MO.5. 14 Even though M0.5 was not fully implemented, 
the ideas acquired from the project were essential to the 
final development of Ml. In particular, M0.5 intro¬ 
duced component abstraction, which enables Ml to 
have a much larger degree of freedom in choosing 
components and thereby creates a larger number of 
interesting designs. 


The Micon system 

We originally developed Micon to design small 
computer systems. (We are now applying Micon to 
design problems in other domains such as design envi¬ 
ronments, or frameworks, and mechanical engineer¬ 
ing. 15 ) The computers designed by Micon consist of the 
following subsystems: 

• Processors. The designs employ a microprocessor 
as the CPU. Presently, Micon can design with the 
Motorola MC6809, MC68008, and MC68010 and the 
Intel 80386. 

• Memory. Designs contain read-only memory 
(ROM) and either static or dynamic random-access 
memory (SRAM or DRAM). In addition, Micon also 
supports the use of cache memories when support chips 
exist in the microprocessor family, such as in the Intel 
80386/80385 combination. 

• Peripherals. All of the I/O operations are per¬ 
formed by dedicated parts such as serial and parallel 
I/O (SIO, PIO) devices. I/O devices can use standard 
interfaces such as RS-232Cs. 

• Bus interfaces. The system supports a selection of 
standard bus interfaces (like the IBM-AT bus). 

• Support circuitry. A set of circuitry provides sup¬ 
port functions. This circuitry includes address decod¬ 
ers, wait-state generators, oscillators, and bus drivers, 
as well as specialized structures for improving system 
reliability, such as Hamming encoders/decoders. 

The complexity of a Micon design roughly equals that 
of a small workstation. 

As we intended Micon to provide a rapid prototyping 
facility, the overriding concern for the synthesizing 
system Ml was whether it would quickly generate 
working designs. Design costs—as measured by a 
variety of metrics including part cost, board area, and 
power dissipation—provide a framework for evaluat¬ 
ing trade-offs. Since designs will not be produced in 
volume, a user is typically willing to lessen the cost 
constraints to obtain a design quickly. Cost optimiza¬ 
tion is not necessary. In general, designs produced by 
Micon compare favorably to human-generated designs. 
(The section on experiments and other experience 
provides more details.) 

Figure 1 shows the Micon system. Note that we made 
a distinction between input from the user or the domain 
expert. The domain expert teaches M1 through the code 
generator (Cgen) how to design target structures and 
uses the data-entry tool to add new part models to the 
database. The user creates designs with M1 by exploit¬ 
ing the work of the domain expert. If the domain expert 
imparts sufficient knowledge to M1, the user need only 
be as knowledgeable as a novice hardware designer. 

Input to Ml is a set of high-level specifications that 
describe the functionality required of the computer. For 
example, users can specify the type of microprocessor, 
amount and type of memory, and the number and type 
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of I/O devices required. Users can also input an addi¬ 
tional set of parameters describing multiple objective 
criteria, such as design constraints (board size, cost) 
and system-reliability requirements. M1 uses its knowl¬ 
edge of components and microprocessor system struc¬ 
tures to develop a design that satisfies the requirements 
given to the system. During the design process, Ml 
interacts with the user to resolve trade-offs in the 
evolving design. 

Ml is a rule-based system written in the OPS/83 
development language. The rule base consists of knowl¬ 
edge about the components Ml can use in the design, 
microprocessor system-design techniques, and design- 
for-reliability techniques. Cgen updates the rule base 
with newly developed components and design tech¬ 
niques to produce competitive designs. 

Automated Synthesis for Reliability (Assure 16 ) is a 
submodule within Ml that analyzes and modifies de¬ 
signs for improved reliability. Assure’s input consists 
of reliability criteria such as mean time to failure 
(MTTF) or mission time. Assure uses external reliabil¬ 
ity tools to analyze the quality of the evolving design. 
Designs that fail to meet the user-specified criteria are 
modified by a variety of techniques, ranging from 
simple IC packaging changes to the addition of entire 
structures (for example, error-correcting codes on 
memory). While Assure is closely tied to Ml, it has a 
separate, distinct, problem-solving architecture. 

Cgen 17 is the knowledge-acquisition tool for the Ml 
module. Typically, a rule base can only be updated by 
those intimately familiar with the program, which 
makes the acquisition of knowledge difficult. Cgen 
allows a hardware designer who is not familiar with 
Ml’s implementation details to add expertise to the rule 
base. Inputs to Cgen consist of schematic drawings and 
simple equations. 

A suite of tools that perform the physical design 
function transforms the output of Ml into a complete 
computer. 

Physical design consists of three steps. The first is 
placement and routing of the design for a board. A set 
of commercial physical design tools called Tango-PCB 
completes this task. The second step is fabricating the 
board. Two board-fabrication technologies are avail¬ 
able: in-house wire-wrapping and printed circuit board 
(PCB) fabrication performed at a local PCB manufac¬ 
turer. Populating the board with components and test¬ 
ing it comprise the final step. 

The database serves as the central repository for the 
part models used by all modules in the system. We used 
a commercially available product, Informix, to build 
the database. All modules receive data from a common 
central database to ensure consistent information in the 
Micon system. 

The Micon system is tightly integrated. All tools 
communicate via Unix interprocessor-communication 
links to the database. The database uses a server to 


Client 1 Client 2 Client 3 Client 4 

(VAX 785) (MicroVAX) (MicroVAX) (VAX 8800) 



Figure 2. Database-server communications. 

establish communication links with each tool, or client. 
This scheme allows the database to run as a separate 
process. In addition to providing a great deal of flexibil¬ 
ity—such as serving a large number of clients at the 
same time—it allows the dedication of a single work¬ 
station on the network as the database node. This 
arrangement centralizes the database management 
functions and eliminates many inconsistency prob¬ 
lems. Figure 2 depicts the database server communicat¬ 
ing with separate workstations across the network. 

Experiments and other 
experience 

We subjected Micon, as a working system, to a set of 
experiments to test its design capabilities. These ex¬ 
periments involved the following steps: 

• teaching the system, through Cgen, to design with 
the set of microprocessors mentioned previously; 

• exercising Ml to generate a set of designs; and 

• passing an M1 -generated design through the physi¬ 
cal design and manufacturing processes. 

Except for Birmingham, the designers who partici¬ 
pated in the experiments had not implemented Micon. 

The designers added data and knowledge concerning 
a large number of components to the system. The 
number of part models equaled 382. (See Birmingham 
and Siewiorek 17 for complete details.) The database 
contained every type of part necessary to build the 
computers, ranging from microprocessors to NAND 
gates and capacitors to RS-232C port connectors (DB- 
25s). Ml’s knowledge base contained over 919 design 
rules, all of which were created by Cgen. These rules 
described how to design with each of the parts in the 
database. 

With its knowledge base and database, M1 can create 
an interesting variety of designs. Since knowledge 
concerning parts is individually acquired, Micon can 
configure unique combinations of parts into a design. 
The ultimate limitation on the designs comes from: 
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Table 1. 

Partial specifications of Ml designs. 



Design specifications 



Results 


Set 

Processor 

SRAM 

(Kbytes) 

ROM 

(Kbytes) 

No. 

of PIOs 

No. 

of SIOs 

Part 

count 

Actual 

runtime* 

(seconds) 

Total 

runtime** 

(seconds) 

1 

MC68008 

32 

4 

1 

0 

20 

250 

1,552 

2 

MC68008 

60 

4 

1 

0 

20 

257 

1,591 

3 

MC68008 

60 

4 

0 

1 

28 

309 

1,949 

4 

MC68008 

60 

4 

1 

1 

31 

354 

4,000 t 

5 

MC68008 

60 

4 

2 

1 

32 

384 

2,459 

6 

MC6809 

32 

4 

1 

0 

17 

227 

1,991 

7 

MC6809 

32 

4 

0 

1 

25 

292 

2,199 

8 

MC6809 

32 

4 

1 

1 

27 

331 

4,067 f 

9 

MC6809 

60 

4 

1 

1 

27 

334 

4,122 t 

10 

MC6809 

60 

4 

2 

1 

28 

361 

2,277 

11 

80386 

60 

8 

0 

1 

79 

965 

4,580 

* On a MicroVAX II. Unix Version 4.3 BSD. 






** Actual runtime plus time spent accessing shared database over the network. Database server runs on another MicroVAX II, Unix 

Version 4.3 BSD. 








t Unusually large value for these items due to the sharing of the database with other programs running at the same time. 


• the size of the memory space, 

• the size of the I/O space (if applicable), and 

• physical constraints (power, area, cost). 

Table 1 shows the specifications used for a set of 
designs. These designs are a small sample of the Ml’s 
diversity and range. As we expected, the part count and 
runtimes increase for designs with more functionality 
(larger memory, more I/O ports). Notice that specifica¬ 
tion sets 1 and 2 (and also sets 8 and 9) are the same 
except for the amount of SRAM. To build a larger 
memory, Ml used a memory chip of larger density that 
resulted in the same part count. Also, set 5 just has one 
more PIO port than set 4 (as in sets 9 and 10). As the 
MC6821 chip used for the first PIO port actually has 
two ports, M1 uses the second port for set 5. Hence, the 
design for set 5 has only one more part than that for set 
4. The additional part is a connector for the second port. 

We have implemented the design Ml produced for 
the MC68008 using specification set 5 from Table 1. 
Analysis of the design showed the following. 

• The design worked at the expected clock rates. 

• Since the design was built almost entirely from 
VLSI components, it compared favorably with hand 
design in terms of part count, as expected. 

We verified other Ml-generated designs by manu¬ 
ally converting the wire list to a schematic. We checked 
the correctness of the design when we created the 


schematic drawings. During this process, we found two 
categories of errors: 

• Missing parts. Some schematic drawings mistak¬ 
enly did not contain several support parts (for example, 
pull-up resistors and bypass capacitors). 

• Missing connections. Ml did not make some neces¬ 
sary connections between components. 

The errors all stemmed from mistakes in the schematics 
input to Cgen, which were propagated to Ml’s knowl¬ 
edge base. We corrected the schematics containing the 
errors, and Cgen generated new rules, which were 
inserted into Ml’s rule base. All designs became bug 
free. 

Assure analyzed and modified an MC6809-based 
design produced by Ml for improved system reliabil¬ 
ity. Figure 3 on the next page graphs the steps taken by 
Assure in terms of the reliability goal, which was to 
maximize the MTTF. 

Assure successively identifies the least reliable sub¬ 
system or part and applies a reliability-enhancement 
technique to it. Assure starts by applying simpler fault- 
intolerant techniques (like changing a package from 
plastic to ceramic or upgrading the quality of the pack¬ 
age) until these result in marginal improvements. These 
improvements correspond to the flattening-out of the 
curve in Figure 3 (after five steps). 

Assure next applies fault-tolerant techniques for the 
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most failure-prone subsystem. Here, Assure added 
Single-Error Correcting/Double-Error Detecting 
(SEC/DED) structures to the SRAM and ROM array. 
Assure also attempted to triplicate the processor and 
the I/O peripherals but found that the fault tolerance did 
not compensate for the failure rate of additional support 
circuitry. Hence, Assure rejected those changes. 

Since the experimental period, a user group has 
employed Micon extensively. The group has grown to 
include over a dozen persons, with participation from 
industry. In fact, we held two workshops with industrial 
participants. Micon functioned without flaw in this 
more realistic environment. (The industrial partici¬ 
pants, while not malevolent, focused on testing the 
capabilities and identifying the flaws of the system.) 
Micon is currently being evaluated at several industrial 
sites. 

The system has recently generated more sophisti¬ 
cated designs than those described here. In particular, 
the system generated an Intel 80386 design with cache, 
coprocessors, and AT-bus interface. The size of the 
system has grown to encompass over 500 part models 
and over 2,500 Cgen-acquired rules. We are beginning 
to add knowledge to Ml's rule base for building micro¬ 
processors. We are also developing the capability to 
form a solid model of the design to evaluate its me¬ 
chanical and thermal properties by using a mechanical 
modeling, air-flow analysis package. 

Micon also functions as an element in a microproces¬ 
sor interfacing course. As part of the laboratory re¬ 
quirement, students build knowledge bases of 80X86 
family components and then generate and verify de¬ 
signs. Micon is a very powerful aid for teaching stu¬ 
dents good design methodology, a topic that is difficult 
to cover in the short period of one semester. Students in 
this class gain exposure to design issues not generally 
experienced until they begin industrial jobs. 


T he Micon system supports all aspects of computer 
design from logic synthesis to manufacture. The 
system can incorporate new technologies readily, 
which is a major strength that allows it to track current 
technology. Micon has proven its design capabilities 
through extensive use both inside and outside of the 
laboratory. 

We continue to research Micon and are creating 
several aggressive designs with the system. In addition, 
we are porting Micon to domains outside of computer 
design. §jl 
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SPECIAL FEATURE 



A neural net 
replicated on a 
VLSI chip 
performs tasks 
formerly 
confined to 
software-only 
implementations. 


Mark Walker 
Paul Hosier 
Lex Akers 


T he need to develop systems that can replicate neural-like behavior in 
real time motivates the pursuit of hardware implementations of 
artificial neural network models. Neural-like behaviors include 
learning by example, the ability to adapt to different tasks, and fault 
tolerance. 

The structure and function of biological systems inspire neural network 
designers. However, design rules and material characteristics that differ 
from these systems severely complicate the task of mapping the relevant 
characteristics of theoretical neural models into architectures constrained 
by IC technology. The most significant barrier is the high degree of 
connectivity required between the processing elements or neurons. 

Neural network models in software generally consist of many very 
densely interconnected processing elements, each of which performs a 
simple computation in parallel with its neighbors. At present, very large 
scale integration, interconnection technology does not support the direct 
implementation of large-scale networks of this type. The area required to 
route connections and to contain the average length of interconnections 
increases at unacceptable rates as more processing elements are added. 

Here we present an analog complementary metal-oxide semiconductor 
(CMOS) version of a model for pattern association, along with discussions 
of design philosophy, electrical results, and a chip architecture for a 512- 
element, feed-forward IC. 


Arizona State University 


An earlier version of this article appeared in the Proceedings of the 8th Annual International 
Phoenix Conference on Computers and Communications , 1989. 
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Hardware implementations 

Researchers have proposed—and actually fabri¬ 
cated—a number of hardware realizations of neural 
networks. A majority of the designs are digital and 
employ either standard digital VLSI technology 1 or 
more specialized architectures such as digital signal 
processors 2 or systolic arrays. 3 Researchers have also 
proposed or fabricated analog 4 and hybrid analog/digi¬ 
tal 5 ' 8 designs. 

Electronic neural networks also differ in the manner 
in which the weights are stored (see the processing- 
element design section). Some designs store weights as 
binary vectors, 7,8 while others represent weights in the 
form of dynamic charges stored on capacitors. 5,6 Still 
others represent weights as static charges stored in 
electrically erasable programmable read-only memory 
devices (EEPROMs). 9 Current designs employ either 
limited interconnections 5 or some means of achieving 
full interconnections. 10 

Comparing the neural models implemented by dif¬ 
ferent architectures provides additional insight. Neural 
network models generally fall into four main classifica¬ 
tions according to which neural-like behaviors they 
replicate. These categories are 

• autoassociation, 

• input classification, 

• regularity detection, and 

• pattern association." 

Several models can perform tasks that span several 
categories. Many of the neural architectures are de¬ 
signed to perform the task of autoassociation, or asso¬ 
ciative memory. Autoassociation typically requires a 
fully connected, single-layer (each processing element 
or node connected to all others) network with symmet¬ 
ric, bidirectional transmittances between processing 
elements. 12 The magnitudes of the transmittances are 
variable parameters of the network and can be adapted 
with a learning paradigm. This process gives rise to a 
distributed storage of previously trained responses. 13 
The system can require inputs in the form of binary 
vectors. The entire network makes excursions through 
state space until it reaches a stable state. The final state 
is usually a previously stored vector that represents the 
pattern that most closely matches the original input. 

Perhaps the best known hardware realization of an 
associative network architecture is the one designed by 
the AT&T Bell Laboratories’ Holmdel group. 4 This 
chip is an analog implementation consisting of sum¬ 
ming amplifiers globally connected by thin-film resis¬ 
tors. Some researchers have suggested networks of this 
type for application in systems where “noisy” inputs 
must associate with codes representing a finite set of 
stored symbols. 

Hardware implementations of networks that can ac¬ 
complish arbitrary pattern association are less com- 


Neural networks could 
serve as very powerful 
computational paradigms 
for hardware realization. 


mon, since they generally require layered approaches. 
Also, the transmittances require more complex learn¬ 
ing algorithms to achieve adaptation. 

Probably the most well-known model in the layered 
network category is the perception , 14 In many applica¬ 
tions, the perceptron is adapted with an algorithm that 
is a modified version of least-mean-square learning. 15 
This algorithm is known as the Delta Rule or, more 
commonly, back propagation. 16 Using the Delta Rule, 
the model trains layered networks by repeatedly pre¬ 
senting pairs of example vectors at the input and output 
layers. It then incrementally adjusts the internal net¬ 
work transmittances so that when the first vector in 
each pair is presented as input, the second appears as 
output. 

When the model adapts the magnitudes of the intra¬ 
layer transmittances to produce the set of desired out¬ 
puts for the applied, trained inputs, the network devel¬ 
ops a sensitivity to the general, invariant properties or 
“rules” relating input to output. Generalization can 
occur when a learning algorithm endows a network 
with a general sensitivity to these properties. This 
process allows correct inference of outputs not in¬ 
cluded in the training set. Neural networks therefore 
could serve as very powerful computational paradigms 
for hardware realization. They can theoretically adapt 
arbitrarily complex I/O functions through training. 

Unlike one-layer networks, the multilayered per¬ 
ceptron employs feedback only during training. This 
approach allows it to operate in an essentially feed¬ 
forward, deterministic manner after training is com¬ 
pleted. This model is therefore well suited for discrete¬ 
time operation when it is embedded in larger digital 
systems. It serves as the basis for the limited-intercon¬ 
nection, CMOS network we discuss. 

Effect of limited 
interconnections 

We have simulated the limited-interconnection net¬ 
work to determine the performance limitations im¬ 
posed by limited fan-in at each node as well as the effect 
of using a fixed topology architecture for different 
tasks. We trained the network in separate runs to per- 
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Figure 1. Network response error as a function of training. 


form three different operations on an 8-bit, digital input 
vector using the Delta Rule. The tasks implemented 
included inversion, shifting to the right, and unsigned, 
one’s-complement addition. Well-posed problems of 
this type are usually poor candidates for neural network 
implementation. They are, however, good worst case 
algorithms for evaluating this architecture, since they 
require a one-to-one mapping (eight-dimensional input 
space to eight-dimensional output space). In addition, 
the shifting and inversion input spaces are linearly 
separable along the desired hyperplanes. 

The total number of connections within an adaptive 
filter determines the degrees of freedom available to the 
network in forming hypersurfaces that correctly divide 
the input space. 17 Consequently, a loose design con¬ 
straint for limited-interconnection, multilayered neural 
networks is that they must contain at least as many 
connections as the smallest, fully connected structure 
that implements the same function. A fully connected, 
two-layer perceptron (one layer of “hidden” units) is 
generally considered to be the smallest structure that 
can generate arbitrarily complex mappings. 18 In these 
tests, a two-layer network with 10 hidden units was the 
smallest fully connected structure that implemented all 
three tasks with an acceptable amount of error. 

Because double-metal CMOS design rules limit the 
generation of dense interconnection patterns, we se¬ 
lected a fan-in of four for each processing element. A 
higher number of possible inputs at each cell would 
require channel-routing schemes between adjacent 
layers. The final design of the limited-interconnection 
structure had approximately the same total number of 
connections as the fully connected structure. It main¬ 
tained the required fan-in as a four-layer network with 
three hidden layers of 10 units each. More precise 
design methods for feed-forward networks that form 
distributed internal representations are not yet avail¬ 



Figure 2. Network response error as a function of failed 
connections (80 percent of all possible 8-bit example 
pairs were trained). 


able. Designers usually determine empirically the 
number of internal processing elements required in 
actual applications of perceptron networks. 

Our first set of results demonstrates the ability of the 
limited-interconnection structure to generalize un¬ 
trained outputs from a limited set of examples. Figure 

1 shows the response error as a function of training. 
Even though the three tasks for which the network was 
trained have very different governing algorithms, each 
network inferred untrained output in an almost identi¬ 
cally linear fashion. 

The second simulation demonstrates the relative 
fault tolerance of the limited-interconnection architec¬ 
ture. In this experiment, an increasing number of ran¬ 
domly selected connections within the network were 
selected and broken (transmittance set to zero). Figure 

2 demonstrates that the networks performing inversion 
and shifting continued to function at the same level in 
spite of a small number of broken connections. The 
actual amount of error introduced in the output, how¬ 
ever, depended on the particular connection broken and 
the task learned. The Delta Rule randomly determined 
the relative importance of each connection in generat¬ 
ing the desired mapping. Also, since we are employing 
a fixed network topology for different tasks, the amount 
of redundancy within the network depends on the func¬ 
tion implemented. For these reasons, many of the 
weights may contribute little to the desired behavior of 
the network and eliminating them does not signifi¬ 
cantly degrade performance. 

The network performing addition, however, did not 
survive the loss of a few connections. The number of 
connections provided apparently did not allow for 
redundancy. Greater fault tolerance might have oc¬ 
curred with a larger network. 
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Figure 4. Analog synthetic neural cell. 
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_ Weighted connection 


Figure 3. Network interconnection scheme. 


Network design 

The number of devices on a die have increased in a 
exponential fashion over the last 28 years. Regretfully, 
the number of interconnection layers has not increased 
at this rate. This state of affairs causes serious restric¬ 
tions on information flow in VLSI and ultra LSI sys¬ 
tems. 19 It also has important consequences for hardware 
implementations of synthetic neural networks. We must 
design cells and networks that do not require high 
interconnectivity when they are scaled to large num¬ 
bers of processing elements. As indicated, we can 
eliminate global interconnections between layers by 
substituting local interconnections and additional lay¬ 
ers. This is an excellent trade-off for VLSI technology 
because we can rapidly design and implement large 
systems by replicating the cells in both row and column 
directions. We need only modify the select logic on the 
die. Figure 3 shows the symbolic layout of a 12-input 
network implemented with this approach. 


Processing-element design 

In most layered perceptron models, each processing 
element operates by taking a weighted sum of its inputs. 
The sum then passes through a threshold function or a 
high-gain, sigmoidal nonlinearity. The compactness of 
the cell design presented here is due in large part to the 
use of analog transmittances and charge summation in 
the implementation of the input weighting function. 

Figure 4 is the circuit diagram of the limited-inter¬ 
connection analog neural cell. The operation of the cell 
is as follows. Weights are stored dynamically on the 
gates of transistors Tl, T2, and T3. Since the inputs to 
each device will be 0 or 5 volts, a voltage on the gate of 
any of the N-channel MOS input transistors charges the 
output to a degraded value (by a threshold voltage). 
This value equals the transmittance of the input transis¬ 
tor multiplied by the logic-valued input (0 or 1). Since 
a transmittance smaller than a threshold cuts off the 
input signal, a P-channel MOS pass transistor can 
directly load a transmittance value onto the gate of the 
device. This type of transistor passes everything above 
the device threshold level. 

For inputs of 5 volts, the drain parasitic capacitors of 
transistors T7, T8, and T9 are charged by current that is 
flowing through the pass transistors T4, T5, and T6. 
The voltage equals the weights minus the device thresh¬ 
old voltage. For inputs of 0 volts, the pass transistors 
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Figure 5. Simulation timing diagram for an analog neural 
cell. 

allow the capacitors to discharge. Although exact 
multiplication of the input and the transmittance does 
not occur, modifying the training algorithm can com¬ 
pensate for this behavior. 

Figure 5 is the timing diagram for a single synaptic 
branch of a three-branch neural cell. Once the storage 
capacitor in each branch is charged, clock phase 1 (<t> 1) 
turns off to isolate the signal from the input (V). Turn¬ 
ing on clock phase 2 (<J>2) allows the analog summation 
of the weighted input signals, which is compared to the 
logical threshold of the first inverter (which comprises 
T12 and T13 in Figure 4). The first inverter should be 
of minimum size to optimize charge transfer. Transis¬ 
tors T14 and T15 on top of the PMOS pull-up device 
establish the logical threshold and hence allow the 
neural threshold to be set at a voltage lower than 
0.5V dd . The output inverter restores the output voltage 
level Vo and drives the next stage. Since this circuit 
uses only positive weights, a shunting transistor Til 
provides inhibition. 

Test chips were fabricated in 3-micrometer, p-well 
CMOS. Figure 6 shows the results of cell electrical tests 
using an HP16500 testing device. The three inputs are 
labeled VI, V2, and V3, while the output is Vo. The 
clocks are Ol and 02; the address lines are A1 and A2. 
For this example, the three inputs and weights were 
held high. The output becomes high upon completion of 
the summing of the inputs at the end of 02. 


System operation 

A system formed by replicating the analog cell oper¬ 
ates in the following manner. Sharing the inputs, weight 
lines, and output lines keeps the number of I/O pins at 
reasonable numbers as the system expands. For the 



Figure 6. Experimentally obtained cell timing diagram— 
voltage vs. time. 


implementation of 512 neurons in a network 32 ele¬ 
ments wide by 16 elements long, 32 lines are used for 
the weights and I/O. The transmittances are multi¬ 
plexed onto a single line running to each cell. When a 
row is selected for a write, the system writes the weights 
for each cell. This procedure continues for four cycles, 
after which all weights for the row addressed have been 
written. The next row can then be selected for transmit¬ 
tance loading. The entire network can be loaded in 
approximately 10 microseconds, which is an order of 
magnitude faster than the discharge time constant for 
an individual gate. Figure 7 illustrates the floor plan of 
the chip. 

One can pipeline the operation of the network for 
maximum throughput if careful attention is paid to the 
clearing of data that must follow every row of input 
data. A potential danger with storage capacitors is that 
a transmittance value below the device threshold with 
a low input does not allow the capacitor to discharge. 
We have eliminated this problem by pipelining a wave 
of zeros through the network. In addition, the Ol and 
02 lines can be swapped every other row to synchro¬ 
nize pipeline operation. This procedure facilitates the 
multiplexing and encoding of the input data. 


W e have designed, simulated, and tested an adap¬ 
tive, CMOS neural network for pattern associa¬ 
tion. We employed limited interconnection be¬ 
tween adjacent layers of processing elements to reduce 
connection densities to VLSI-implementable levels in 
this synthetic neural IC design. 

Simulations demonstrate that substituting global 
interconnection with limited interconnection and addi¬ 
tional layers does not significantly degrade desired 
neural-like behaviors. We have developed a compact 
analog neural cell that functions with a dynamic RAM 
fabrication line and that drives a style of architecture 
that is fully compatible with LSI technology, si 
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Figure 7. Circuit floor plan. 
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continued from p. 9 

to criminal penalties. Not only is such 
copyright infringement a crime in itself, 
but it may also lay the basis for a con¬ 
viction for mail or wire fraud and for 
violation of the Racketeering Influenced 
and Corrupt Organization (RICO) Act. 

Two basic elements must be present 
to prove copyright infringement. The 
alleged pirate must have had an opportu¬ 
nity to copy the copyrighted work (had 
access to it). The allegedly piratical 
version must be “substantially similar” 
to the original, copyrighted version. De¬ 
termining substantial similarity is a 
well-defined process in the case of nov¬ 
els, paintings, and songs. However, it is 
inevitably unpredictable because it calls 
for making essentially subjective judg¬ 
ments. In the case of computer pro¬ 
grams the concept of substantial simi¬ 
larity is not yet well defined. Accord¬ 
ingly, determining whether program A 
is substantially similar to program B 
must be done without definite standards, 
in the current state of copyright case 
law. 

Protecting computer programs. A 

third, more subtle element of proof of 
copyright infringement is that the mate¬ 
rial the alleged pirate took from the 
copyright owner’s work was protected 
by the copyright. According to settled 
copyright doctrine, the copyright laws 
protect only the expressions of ideas, 
but do not protect the underlying ideas 
themselves. The copyright in a book 
about making microcomputers will keep 
others from copying and selling the 
book. But it will not keep them from 
manufacturing microcomputers in accor¬ 
dance with the teachings of the book. 
Preventing this may be done, if at all, 
only under the patent laws. Unlike copy¬ 
rights, of course, patents issue only after 
an expert examination of the novelty 
and technical merit of the invention. 

The words of a book plus some lim¬ 
ited penumbra of organization and style 
about them comprise authors’ expres¬ 
sion of their ideas. The substantive con¬ 
tent, however, is unprotectible idea. 
When the expression is functional, and 
few or no other modes of expression 
will accomplish the same purpose, the 
expression is said to “merge” with the 
idea and thus become unprotectible. 

For example, paraphrases of the 
words of a novel may well be copyright 


infringements. The words used to state 
instructions for playing a game or for 
using a computer program may be less 
open to idiosyncratic variation than 
those of a novel and thus may be consid¬ 
ered to merge with the idea. Hence, a 
paraphrase of them may well not be a 
copyright infringement. To copy or 
paraphrase such words is to take idea 
rather than expression, if the expression 
in such words has merged with the 
idea. Taking ideas is not copyright 
infringement. 

Similarly, spreadsheet formats, 
Nyquist charts, Bode charts, and other 
forms or diagrams developed to carry 
out the teachings of a particular new 
method of doing something (for ex¬ 
ample, of double-entry bookkeeping or 
devising electronic systems) may be 


The copyright status of 
screen displays for 
computer programs 
remains confused. 


first disclosed in a copyrighted book ex¬ 
plaining the new method. Nevertheless, 
if it is necessary to use forms or dia¬ 
grams, or ones similar to them, to prac¬ 
tice the new method, the forms and 
diagrams are considered as “necessary 
incidents” to the new method. They fall 
into the public domain along with the 
new method. (The new method would 
not necessarily fall into the public do¬ 
main if it were the kind of thing that it is 
possible to patent, and the author pat¬ 
ents it.) 

Although these principles can be ap¬ 
plied to the traditional subject matter of 
copyright without undue difficulty, they 
do not always readily fit computer pro¬ 
grams. Courts disagree about what as¬ 
pects of a program should be considered 
functional or utilitarian. For example, 
one court has held that commercial need 
or desire for compatibility is not an as¬ 
pect of function. Hence, the court re¬ 
fused to excuse copying lines of basic 
input-output system (BIOS) code as the 
taking of idea rather than of expression. 
The court found copyright infringement 
even though the copied BIOS code was 
allegedly needed to make computers and 
computer programs compatible with an 


established, installed base of computers 
and software written for them. 

But another court has held substan¬ 
tially the opposite. It ruled that the right 
to engage in second-sourcing a chip 
(selling a completely substitutable chip) 
legitimized some copying. In that case, 
the accused copyright infringer had the 
right to second-source a microprocessor 
chip and make a fully compatible sec¬ 
ond chip. The court held that this meant 
that it was also legitimate to write the 
microcode for the second microproces¬ 
sor chip in a way that replicated some 
aspects of the microcode of the original 
microprocessor (the lines of code of 
some microcode instructions). 

Courts also disagree about whether 
the copyright in a computer program ex¬ 
tends to anything more than the particu¬ 
lar lines of code or a close paraphrase of 
them. Most programmers will agree that 
translating a program from Basic to 
Cobol would be sufficiently easy that 
marketing the resulting program should 
be a copyright infringement. But when 
the “translation,” if it may even be 
called that, is from Basic to APL or to 
some other kind of language very differ¬ 
ent from Basic, programmers tend to 
disagree. They disagree over whether 
marketing the resulting computer pro¬ 
gram should be regarded as a copyright 
infringement. Nevertheless, that even 
some translations of code are considered 
infringements indicates that the scope of 
a copyright in a computer program is 
wider than merely the specific lines of 
code in it. 

Some courts have said that copyright 
protections apply much more broadly 
than just to the lines of code of a com¬ 
puter program and to translations and 
close paraphrases of them. These courts 
hold that the protections extend to the 
“sequence,” “structure,” and “organiza¬ 
tion” of computer programs, as well. 
Such courts do not rigorously define se¬ 
quence, structure, and organization. But 
their meanings seem to include such 
things as the way subroutines and other 
modules are partitioned from the rest of 
the program and the selection of data 
fields. 1 

One recent decision has protected the 
“sequence and flow” of screen displays 
as part of the protection for a computer 
program. 2 Moreover, while individual 
algorithms are considered ideas, it has 
been argued that the particular choice 
and selection of algorithms to imple¬ 
ment in a computer program is part of 
structure and expression. 
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Even under the structure dispensation, 
however, the structures of two computer 
programs (using that term in the senses 
indicated above) may be quite dissimilar 
and still generate the same visual dis¬ 
play. The only way for the “structure” 
of a computer program to embrace the 
screen displays associated with the com¬ 
puter program would seem to be by out¬ 
right judicial fiat. That would be for the 
court to define the structure of computer 
programs as including their visual dis¬ 
plays. In other words, the visual output 
that results from operation of a com¬ 
puter program would be the same thing 
as the internal organization of the com¬ 
puter program. 

One court went so far as to suggest 
this. But a later decision by another 
court stated that this was a mistaken 
idea and that the first court had “overex¬ 
tended the scope of copyright protection 
applicable to screen displays.” (How¬ 
ever, the second court then held that the 
sequence and flow of successive screens 
was comprehended in the copyright on 
the computer program generating the 
screen displays.) Such a fiat simply 
disregards the fact that there is no par¬ 
ticular correlation between the internal 
organization of display-generating code 
and the visual appearance of the 
display. 

Case law on screen displays. Given 
this general state of uncertainty in the 
computer copyright field, the copyright 
status of screen displays for computer 
programs remains confused. In one re¬ 
cent case, the Softklone case whose 
screen display I discussed earlier in this 
series, 3 the court held that a menu 
screen display must be considered a 
separate work from the underlying code 
of the associated computer program. 
Therefore, it did not matter that the code 
of the defendant’s computer program 
was not substantially similar to the code 
of the plaintiff’s program. The court 
found copyright infringement because 
the two parties’ menu displays looked 
substantially similar and the plaintiff 
had properly registered its display sepa¬ 
rately from the code. Another recent de¬ 
cision endorses the reasoning of 
Softklone on this point. 

In principle, this is the most logical 
view to take. Moreover, it is consistent 
with the view that courts have taken of 
video-game copyrights. As indicated 
earlier, video-game screen displays are 
similar to those associated with other 
types of computer programs. Their treat¬ 


ment under the copyright laws may 
therefore suggest how screen displays, 
in general, should be regarded for copy¬ 
right purposes. Whether video games 
could be protected by the copyright laws 
was at first uncertain. Initially, it 
seemed that only the codes of the com¬ 
puter programs used with the video 
games were protec tible, although no de¬ 
cision expressly so held. After a time, 
the courts came to the view that the im¬ 
ages used in the video games were pro- 
tectible, apart from the underlying 
codes, as “audiovisual works.” 

The courts in addressing video-game 
copyrights noted that many different 
codes could be written to generate the 
same video-game imagery. They may 


The considerable 
difference in approach 
that these decisions take 
indicates the uncertain 
state of the law as to 
screen displays. 


also have been moved by another fact. 
One group of persons created the im¬ 
agery and plot of commercially success¬ 
ful video games (such as Donkey Kong), 
and a different group wrote the neces¬ 
sary code. (That has been true for screen 
displays of computer programs such as 
1-2-3 also.) In any case, the courts were 
prepared to find copyright infringement 
of audiovisual works when visual ap¬ 
pearances were similar and codes were 
dissimilar, on the ground that the two 
things were separate works of author¬ 
ship, each of which was entitled to inde¬ 
pendent protection in its own right. 

Finally, in a case involving a vari¬ 
ation on the Pac-Man video game, the 
court found that the defendant’s code 
was approximately 90 percent bit for bit 
the same as the plaintiff’s code. There¬ 
fore, the defendant infringed the copy¬ 
right in the code. But the defendant’s 
rather slight changes in the code had re¬ 
sulted in visual imagery that did not re¬ 
semble that of Pac-Man. Hence, the 
court found no infringement of the 
copyright that existed in the audiovisual 
work, but there was infringement of the 


copyright in the code. Thus, the video- 
game decisions recognize that screen 
displays and the code associated with 
them are distinct from one another. 
Rather than a one-to-one relationship, 
they have a many-to-many relationship. 

The Softklone decision treating com¬ 
puter program screen displays as wholly 
distinct from code thus appears to be 
consistent with the reasoning used in the 
video-game cases. Other court deci¬ 
sions, however, support the view that 
protection of screen displays may be 
based on the copyright in the computer 
program considered as a whole or on the 
copyright in its code. 

In one 1986 case the court protected 
the screen displays generated by the pro¬ 
gram as audiovisual works. It is unclear 
from the opinion whether the screens 
had been registered with the Copyright 
Office separately from the underlying 
code (which apparently differed consid¬ 
erably between the two programs). If 
they were, again the result would appear 
sound in principle and consistent with 
the video-game precedents. Some of the 
court’s language, however, suggests a 
less sound rule. The court may well 
have considered the copyright in the 
code of the plaintiff’s computer program 
to extend to the screen displays of the 
computer program. Then, that similarity 
between the two parties’ screen displays 
in some unexplained way was supposed 
to make their intrinsically different com¬ 
puter program codes substantially simi¬ 
lar. Two later decisions rejected this ap¬ 
proach as erroneous. 

In one extreme 1985 decision, the 
trial court stressed the close similarity 
of the parties’ screen displays and the 
fact that the two computer programs 
produced results that were indistinguish¬ 
able to naive users. Therefore, even 
though the trial court did not find the 
codes (which were written in different 
programming languages) to be similar, 
even on a translation basis, the court 
nonetheless found copyright infringe¬ 
ment because the screen displays looked 
similar. On appeal of this decision, how¬ 
ever, the court of appeals proved unwill¬ 
ing to place much weight on the similar¬ 
ity of the screen displays. It found the 
computer programs substantially similar 
and thus infringing for other reasons. 1,4 

The considerable difference in ap¬ 
proach that these decisions take indi¬ 
cates the uncertain state of the law as to 
screen displays. Clearly, the difficulty 
of the subject matter and the tribunals’ 
lack of technological education are fac- 
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tors in the disorder of the case law. 

Another factor weighs heavily. Courts 
in some of these cases may be moved by 
the semblance of unfair competition and 
misappropriation. The defendant may 
appear clearly to be appropriating the 
visual effects that the plaintiff created, 
even if not the plaintiff’s coding efforts. 
This view often imparts an irresistible 
impulse to a court to give the parties 
their just deserts, by protecting the first 
comer against the second comer’s com¬ 
petitive imitation. Kaplan of Harvard 
referred to this as the Manichaean her¬ 
esy of copyright law. Desire to achieve 
equity may overcome traditional copy¬ 
right doctrine and its limitations on the 
scope of legal protection. The desire 
may run roughshod over any balanced 
view of the need of workers in the field 
to have a technical repertory or toolkit. 
Thus, perceived equity may conflict 
with the requirements of software prog¬ 
ress. At the very least, a seemingly equi¬ 
table result may be realized via a shoddy 
analysis. 

The Copyright Office 

An important variable in predicting 
the scope of copyright protection is the 
position that the US Copyright Office 
takes in the exercise of its registration 
function and in its rulemaking capacity. 
The Office does not promulgate the 
copyright law; that responsibility be¬ 
longs to Congress. Congress legislates, 
and the courts interpret the legislation. 
The task of the Copyright Office is to 
examine applications for registration of 
copyright to determine whether the ap¬ 
plications and the works to be registered 
meet the requirements of the copyright 
laws. In addition, the Office is author¬ 
ized to issue regulations clarifying 
technical aspects of copyright law, in¬ 
sofar as they affect the registration of 
copyrights. 

Despite the limited scope of the 
Office’s authority, the courts and Con¬ 
gress give considerable deference to the 
views and administrative actions of the 
Office in carrying out its responsibilities 
under the copyright laws. Moreover, the 
registration of a claim of copyright is 
presumptive evidence that the copyright 
is valid, and that the applicant’s state¬ 
ments in the application form are cor¬ 
rect. Accordingly, the Office’s determi¬ 
nation that screen displays should be 
considered separate works from the 
underlying code generating them, and 
should therefore be separately regis¬ 


tered, will probably have a significant 
impact on future court decisions in this 
field. 

Before Softklone. The Copyright 
Office began with no position at all on 
whether screen displays for computer 
programs were separately registrable, 
presumably because it did not recognize 
that an issue existed. It therefore regis¬ 
tered some screen displays, as such, ap¬ 
parently inadvertently. However, the 
registration was consistent with its and 
the courts’ position that video-game dis¬ 
plays were independently copyrightable. 

The Office then adopted a position 
against separate registration of screen 
displays. When Lotus sought to register 
the screens for its 1-2-3 spreadsheet pro¬ 
gram, as a preliminary step to suing two 
competitors, the Office refused to regis¬ 
ter the screens as such. In a January 
1987 letter it stated: 

It is the position of the Copyright 
Office that textual screen displays embod¬ 
ied within the computer program that gen¬ 
erates them are covered by the registration 
for the programs, without either the need 
or justification for separate registration for 
the displays. Because the displays are con¬ 
sidered to be an integral part of the pro¬ 
gram, the authorship in the displays ap¬ 
pears to be the same as that contained in 
the program. Moreover, the Copyright 
Office would not register a claim in the 
format or layout. 


The public and the 
courts simply have to 
make their own ways 
in the world. 


After its refusal to register the screen 
displays of 1-2-3, the Copyright Office 
refused generally to permit menus and 
similar nonpictorial screen displays of 
computer programs to be registered 
separately. It would not accept deposits 
of printouts or photographs of screens, 
or other material identifying them. The 
Office’s policy was that the registration 
of the source or object code of a com¬ 
puter program inherently included what¬ 
ever copyrightable content existed in the 
screens displayed when the program was 
executed. 

The Softklone court’s rejection of that 
theory, however, caused the Office to 


feel that it needed to reevaluate its pol¬ 
icy. It solicited public comment on 
whether it should change its policy, and 
it held public hearings in September 
1987 at which various industry members 
and groups stated their positions. At 
these hearings, industry spokesmen and 
Computer Society 5 representatives urged 
various positions on the Copyright 
Office. These positions ran the whole 
spectrum of possibilities on whether 
screen displays should be considered 
separate works from computer programs 
associated with them, and thus sepa¬ 
rately copyrightable, registrable, and 
protectible. 

After Softklone. After being exposed 
to all of these views, the Copyright 
Office decided to revert to its pre- 
Softklone position against separate reg¬ 
istration of screen displays, with several 
minor exceptions. One was that the 
Office would allow the identifying ma¬ 
terial submitted with applications for 
registration of computer programs to in¬ 
clude pictures of, or other material iden¬ 
tifying, the screen displays associated 
with the programs. Another change was 
that the Office would allow the work to 
be identified in the application form ei¬ 
ther as “computer program” or “com¬ 
puter program code and screen dis¬ 
plays.” In either case, the Office would 
consider the registration to cover the en¬ 
tire copyrightable material in the code 
and screen displays. 

The Copyright Office did not greatly 
elaborate on its above-quoted 1-2-3 
statement in its June 1988 announce¬ 
ment of policy on copyrighting screen 
displays separately. It preferred to stick 
to a terse, conclusory assertion: 

The Office finds that in the interest of a 
clear, consistent public record, our regis¬ 
tration practices should discourage piece¬ 
meal registration of parts of works. 

... Copyright claimants are able to reg¬ 
ister all copyrightable elements contained 
in their work with a single application and 
fee [$10]. 

The public is benefited through the 
maintenance of a clear, accurate, easily 
understandable public record. Permitting 
multiple registrations of parts of works 
would increase confusion among those at¬ 
tempting to use the records of the Copy¬ 
right Office. Subdividing claims [of copy¬ 
right ownership] might also result in mul¬ 
tiple infringement actions and multiple 
claims for statutory damages, based on 
separate registrations. 

The Copyright Office benefits by hav¬ 
ing a simplified administrative analysis. 
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The Office's June 1988 policy state¬ 
ment does not expressly address the 
Softklone court’s statement rejecting the 
policy that the Office adopted. The 
Office acknowledged in its statement 
that unnamed “proponents of separate 
registration contend that the nature of 
authorship embodied in the computer 
program code is substantially different 
from the authorship typically embodied 
in computer screens.” It also acknowl¬ 
edged that everyone agrees that the 
same screen displays can be generated 
by many, very different codes. Even ac¬ 
cepting those points, however, the 
Copyright Office stated that “this does 
not alter the fact [sic] that the computer 
program code and screen displays are 
integrally related and ordinarily form a 
single work.” Those who disagree with 
this policy, the Office said, “must con¬ 
cede that the program code and screens 
are conceptually a single work.” 

Finally, the Office stated that it did 
not intend to address the concern of 
some witnesses (those witnesses from 
the Computer Society, for example) that 
the copyright registration rules appli¬ 
cable to screen displays should specify 
the boundaries or scope of the copyright 
to provide greater guidance to users. 

The Copyright Office stated that it was 
“sympathetic to users who may have dif¬ 
ficulty in determining the scope of copy¬ 
right,” but it considered their problem 
not to be its responsibility. The Office 
said that it will attempt “to keep out of 
the public record any frivolous, un¬ 
sound, or otherwise unjustified” copy¬ 
right registrations, but that is all it is 
prepared to do to “assist the public and 
the courts.” They will simply have to 
make their own way in the world. 

The Office’s June 1988 policy state¬ 
ment needs analysis on two levels: first, 
in terms of what it expressly says; sec¬ 
ond, and more important, in terms of 
what it fails to address. The first, or fa¬ 
cial, level of analysis need not long de¬ 
tain us, and we may proceed directly to 
that. The second level of analysis re¬ 
quires further foundation to be laid, 
however. It calls for a description of 
what undiscussed alternatives were or 
are available for protecting screen dis¬ 
plays, how practical they are in terms of 
ease of implementation, and what the 
likely consequences are of adopting one 
or the other of the different choices. 

A facial critique. There are serious 
problems with the Office’s June 1988 
policy statement, initially even at a lin¬ 


guistic level. Most of it is purely conclu- 
sory, some of its assertions are on their 
face not credible, and what passes for its 
basic rationale is at best metaphysical 
incantation. The policy statement may 
appropriately be subjected first to a cri¬ 
tique at the facial level, to push aside 
the verbal debris. Then criticism can 
continue at a more fundamental level, in 
terms of the policy issues that are not 
addressed but nonetheless must be con¬ 
sidered in any legitimate analysis. 

First, the imagined public benefit that 
the Office perceives in a single copy¬ 
right application and single $10 fee, in¬ 
stead of two applications and $20, is 
trivial. Second, the alleged confusion 
engendered by having screen displays 
registered separately from code exists 
only in the minds of the authors of the 
Copyright Office’s statement. The 
Copyright Office does not explain how 
separate registrations of the two things 
will increase confusion, and the asser¬ 
tion is implausible. 

Third, the alleged dangers of multiple 
copyright infringement actions and mul¬ 
tiple claims for statutory damages are 
figments of someone’s imagination. For 
one thing, pirates of screen displays do 
not copy (and in the reported cases have 
not copied) the code. If they did copy 
the code, it would be easier just to sue 
them for copying the code; that they do 
not copy the code is why we must ad¬ 
dress the screen-display copyright in¬ 
fringement problem. Moreover, the 
stated fear of excessive, multiple statu¬ 
tory damages based on multiple suits 
and registrations ignores the fact that 
the amount of statutory damages to 
award to the plaintiff is discretionary 
with the court. There is no reason to as¬ 
sume that federal judges are any less 
able than the Copyright Office in assess¬ 
ing the equities of this type of situation, 
and that they are incapable of sensibly 
exercising the discretion that Congress 
confided in them. In short, the calcu¬ 
lation of benefits and harms that the 
Office uses to justify its policy against 
separate registration of screen displays 
is hopelessly flawed. 

The “fact that the computer program 
code and screen displays are integrally 
related and ordinarily form a single 
work” is the supposed conceptual ration¬ 
ale for fusing code and screen displays, 
for copyright purposes. But in what 
sense is that a fact? It is not a fact; it is a 
conclusion; and the conclusion is mere 
ipse dixit. 

Even worse, what does it mean? In 


what sense are code and display inte¬ 
grally related, and what does integrally 
related mean, if anything? The diskettes 
on which products such as 1-2-3 or 
dBase are sold ordinarily contain both 
working code (for doing the work) and 
display code (for generating the dis¬ 
plays) on the same diskettes. The dis¬ 
play code on such diskettes generates 
the displays in question here. Those are 
facts, whether you call it integral rela¬ 
tion or something else. But there is no 
rational connection between those facts 
and the conclusion that there should be a 
single copyright covering the code and 
the visual display, even if integral re¬ 
lation is a suitable epithet or characteri¬ 
zation to apply to them. Incidentally, 
“integrally related” is neither a term de¬ 
fined (even implicitly) in the copyright 
statute nor a term of art used in custom- 
honored copyright case law and jargon. 

By the same token, the analysis is not 
furthered by asserting that the code and 
display form a single work. That is 
again a conclusion, not a factual obser¬ 
vation. It does not assist us in determin¬ 
ing whether public policy is better 
served by addressing the copyright in a 
screen display separately from or to¬ 
gether with the copyright in the related 
computer program. 

Moreover, the assertion is also arbi¬ 
trary, as it is for any related works or 
composite work. For example, the music 
associated with “Greensleeves” has been 
used for other works, such as the carol, 
“What Child Is This?” By the same to¬ 
ken, many of the songs in Shakespeare’s 
plays have been given several different 
musical settings; “O Mistress Mine” is 
an example. The same sort of thing 
could be said of the different artists’ il¬ 
lustrations of Alice in Wonderland and 
the Pooh books. Some reasonable ob¬ 
servers would agree that the words and 
music, or books and pictures, form a 
single work; others would disagree; any 
such assertion is necessarily arbitrary. 
Even more important, one may ask only 
in vain how to solve any practical busi¬ 
ness or legal problem by attempting to 
determine whether the music and words, 
or pictures and words, form a single 
work. 

In short, the Office provides no analy¬ 
sis here at all. It offers only an unex¬ 
plained conclusion, smothered in lan¬ 
guage that obscures any factual content. 
Indeed, only one piece of fact appears in 
all of the Copyright Office’s formal ex¬ 
planation of its policy: “The Copyright 
Office benefits by having a simplified 
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administrative process.” Viewers of pro¬ 
grams such as “Yes, Minister” will 
doubtless appreciate the true rationale of 
the Office’s decision. 

I n the next issue I turn to a further cri¬ 
tique of the Copyright Office’s deci¬ 
sion. What are the range of alterna¬ 
tives in addressing protection of screen 
displays and related aspects of user 
interfaces? How do different options re¬ 
late to the policy concerns described 
earlier? To give readers a road map: I 
conclude that protecting screen displays 
by means of existing copyright law is 
more feasible than trying to develop a 
more ideal system. I also conclude that 
separate registration of screen displays: 
a) makes more sense than any other ap¬ 
proach, and b) probably can provide ap¬ 
propriate incentives and encouragements 
to creation without stifling software 
progress. 
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Microprocessors challenge supercomputers 

Ware Myers 
Contributing Editor 


it x he latest microprocessors are 
within striking range of 
supercomputer performance,” 
Eugene D. Brooks III, staff physicist in 
the Computational Physics Division at 
Lawrence Livermore National Labor¬ 
atory, asserted in a recent interview. 

These microprocessors promise about 
half the performance of a Cray IS, pro¬ 
viding that the scalar code being run 
“hits cache well.” (The Cray IS is the 
traditional measuring unit of supercom¬ 
puter performance.) Furthermore, super¬ 
computer users will soon find it eco¬ 
nomic to rethink and recode applications 
to run on parallel machines built from 
microprocessors, according to Brooks’ 
analysis. 

Brooks has pursued parallel¬ 
processing and computational physics 
research at Lawrence Livermore in 
California since 1983. Prior to that, as a 
research assistant at the California 
Institute of Technology, he contributed 
to the development of the Hypercube, a 


large distributed machine based on com¬ 
mercial microprocessor technology. His 
PhD in physics is from Caltech. 

The Intel i860 64-bit RISC-based 
microprocessor represents the level of 
performance that Brooks has in mind. 
Containing 1,000,000 transistors, the 
architecture of this device embraces 
many supercomputer concepts, such as 
memory management, instruction cache, 
data cache, pipelined architecture, one 
operation per clock cycle, and fully 
pipelined floating-point operations. In 
addition to the 40-MHz i860, other 
high-performance microprocessors exist 
in the marketplace. 


The memory-bandwidth 
problem 

A limitation on the performance of 
the recent microprocessors is their abil¬ 
ity to get data from main memory into 


the computing units, Brooks pointed 
out. For applications that are generally 
satisfied by data in cache memory, per¬ 
formance is very high. In applications 
that have to go out to main memory for 
data rather frequently, throughput slows 
down compared to supercomputers be¬ 
cause main-memory bandwidth of these 
microprocessors is much less than that 
of supercomputers. 

The latest microprocessors use page¬ 
mode dynamic RAMs that speed up 
memory access a little bit, says Brooks, 
but not substantially. Supercomputers 
contain very large memory subsystems 
that deliver one word per clock to the 
processor. Designers usually construct 
these subsystems by interleaving large 
numbers of relatively fast static RAM 
chips and overlapping the accesses to 
get high bandwidth. 

“Can you do this right on the memory 
chip itself instead of stacking up 64 
memory chips glued together with MSI 
[medium-scale integration] logic 
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chips?” Brooks asked himself. “Yes, all 
you have to do is put that logic right on 
the memory chip itself. In this way you 
can overcome the very poor main- 
memory bandwidth of these fast micro¬ 
processors.” 

Supercomputer on a chip 

In terms of single-processor hard¬ 
ware, Brooks expects to see “the exact 
architecture of the classic supercom¬ 
puter stuck down on one chip that holds 
the cache, a floating-point unit, a CPU, 
and an interface designed to handle 
memory chips with internal interleaving 
in two to five years.” These systems will 
literally be the equivalent of the classic 
supercomputer, both in terms of scalar 
performance and vector performance— 
as soon as someone straightens out this 
last problem, the memory bandwidth. 

These systems should be mounted on 
little boards, maybe a little larger than 
one’s hand. “They will be someone’s 
personal micro,” Brooks said. Of course, 
they will be 100 times cheaper than the 
supercomputers they will be replacing. 

“If you look at the applications we are 
actually running at the laboratory, you 
find that 90 percent of them are 5- to 
10-minute runs. People could happily 
run such applications on this latest 
group of microprocessors, and run them 
10 or 100 times cheaper,” Brooks added. 
“The number of applications right now 
that actually use all of the processors [in 
a multiprocessor supercomputer], use all 
of main memory, and run in a fully 
chained vector mode at top speed are 
remarkably few.” 

Parallel architectures 

“The real supercomputers of the fu¬ 
ture will be composed of hundreds of 
these boards, arranged so that they can 
talk to each other,” Brooks went on. 
“Right now there are two thrust areas 
in this regard. The first is the high- 
performance equivalent of an E-mail 
network. Here the programs running on 
the various processors send messages to 
each other, with the programmer writing 
explicit code to send and receive mes¬ 
sages. In the second all of the processors 
are wired to share each other’s memory, 
and a given processor can read or write 
another processor’s memory at will.” 

The Intel iPSC2 is an example of 
message-passing architecture. The BBN 
Butterfly represents a shared-memory 
design. 


The only real difference between 
these two architectures lies in the higher 
communications capability of the 
shared-memory approach. “On message¬ 
passing machines the user program con¬ 
structs a message and hands it off to a 
system call on the local processor. This 
processor wires the communication to 
the remote processor, which hands the 
message to a system call, which finally 
hands it to the user program,” Brooks 
outlined. 

“The communications performance on 
message-passing arrangements is not 
dominated by the hardware, but by all 
those layers of software. It is this extra 
systems-software overhead that makes 
message-passing machines poor per¬ 
formers when it comes to communica¬ 
tions. On shared-memory machines the 
construction, dispatching, and responses 
to messages are handled directly in the 
hardware supporting the shared-memory 
paradigm.” 

Brooks argued that a parallel system 
with 100 or so of these powerful micro¬ 
processors will provide performance 
about 100 times beyond that of the clas¬ 
sic supercomputer “quite soon.” In fact, 
“the new microprocessors can do it now 
for applications that don’t need the 
main-memory bandwidth and that can 
run well out of cache. But eventually 
they will do that even for vectorized 
codes.” 


Rethinking the code 

These parallel processors still must 
have code generated for them. Brooks 
remains dubious about converting 
“dusty Fortran decks” to parallel code 
automatically. “People who tell you they 
have an automatic parallelizing compiler 
for even a four-processor system are 
selling you snake oil,” he smiled. “We 
have tested a lot of these supposedly 
automatic compilers, and they just don’t 
work for anything real.” 

He was not talking about vectorizing 
compilers. They do work. He referred to 
the compilers that look at the whole pro¬ 
gram and have to figure it out. 

Still, a good many problems in the 
physical sciences are inherently parallel 
in the first place. In the past they have 
been turned into single-thread programs 
by sweat and tears. In the future they 
will be written in parallel fashion from 
the beginning. 

“We are learning that we have to 
write explicitly parallel code for these 
new machines,” Brooks reported. “There 


are parallel programming languages and 
packages used to enhance Fortran, C, 
and other languages that allow us to do 
a good job.” 

Of course, rethinking and recoding 
will take time and cost money. “If a new 
machine is only twice as fast, nobody is 
willing to spend a lot of time rethinking 
or rewriting the code. There is no lever¬ 
age,” Brooks emphasized. “If the new 
machines are about the same cost, but 
100 times faster than a present-day su¬ 
percomputer, suddenly people in droves 
will want to reprogram their applica¬ 
tions for them. There will be enormous 
leverage.” 

Still, this revolution won’t arrive next 
month. “No, but it might be here in a 
couple of years,” Brooks forecast. 

“There has been a lot of experience with 
parallel machines over the last five 
years. People doing academic work in 
this area have had the vision to see the 
future. There are many new journals re¬ 
porting results. We have a base to draw 
upon. Now suddenly with the latest crop 
of microprocessors, the leverage is 
there. It is no longer an academic exer¬ 
cise. We can still win the war against 
Amdahl’s law.” 

[Eugene Brooks’ views represented in 
this interview are his personal opinions 
and do not reflect those of the United 
States Government or the University of 
California. — Ed.] 

Shockley and the 
transistor 

William Shockley, 79-year-old co¬ 
inventor of the transistor, Nobel laureate 
in physics, and early advocate of Cali¬ 
fornia’s Silicon Valley, died of cancer 
in August this year. 

Highly controversial for his views on 
social science, Shockley nevertheless 
holds an enviable place in history for his 
part in inventing the first semiconductor 
device. This device became the seminal 
innovation of the current electronic age, 
as most modern devices contain its lin¬ 
ear descendants. Former employees at 
Shockley Semiconductor in Palo Alto, 
California, went on to the invent the in¬ 
tegrated circuit and the microprocessor. 

While at Bell Laboratories, Shockley, 
John Bardeen, and Walter Brattain com¬ 
bined their knowledge and experience to 
find a solid-state semiconductor replace¬ 
ment for the classic vacuum tube ampli- 
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fier invented by Lee De Forest in 1907. 

After a year of failing to find a re¬ 
placement, the team achieved success. 
On December 16, 1947, Brattain used a 
wedge-shaped piece of plastic to press 
two narrowly spaced, parallel-line gold 
contacts less than an inch apart against a 
block of germanium. When an electric 
signal was introduced to the germanium 
through the wires, the device amplified 
the signal. Another employee, John 
Pierce, called the device a transistor 
because it transferred current across a 
resistor. 

The year of repeated failures 
prompted Shockley to say later, “A ba¬ 
sic truth that the history of the invention 
of the transistor reveals is that the foun¬ 
dations of transistor electronics were 
created by making errors and following 
hunches that failed to give what was ex¬ 
pected. Pure inspiration contributed less 
to progress than did perseverance and 
the willingness to try again.” He later 
called the process “Creative-Failure 
Methodology.” 

Shockley purchased the transistor pat¬ 
ent for $25,000 when he left Bell Labs 
in 1956 and later founded his own com¬ 
pany with major financing from 
Beckman Instruments. His inability to 
manage engineers and physicists led to 
eight of his best researchers breaking 
away two years later to found Fairchild 


North American microcomputer 
users can reach the popular French 
Minitel information network with a 
local call when using Minitel’s free 
terminal software. With modem para¬ 
meters set at 1200-8-N-l, call BBS 
number 1 (800) 999-6163 and enter 
“Minitel” at login prompt to order 
the software. 

Representatives of ANSI, CEN, 
and CENELEC standards bodies met 
recently to prevent barriers to free 
trade as Europe moves toward creat¬ 
ing a single market by 1992. They 
believe cooperative coordination of 
standards, certification, and testing 
will improve day-to-day business for 
US and European industries. 


Semiconductor. The group included 
Robert Noyce, who went on to coinvent 
the integrated circuit, and Gordon 
Moore, who later invented the micropro¬ 
cessor. Noyce and Moore later founded 
Intel Corp. 

For almost two generations, most Sili¬ 
con Valley start-up companies could 
trace lineage back to Fairchild and be¬ 
fore that to Shockley Semiconductor. In 
fact, an industry genealogy chart of that 
family tree decorated walls in area of¬ 
fices for years. 

Shockley lectured in engineering sci¬ 
ence at Stanford University in California 
for many years. Transistor coinventor 
Brattain died in 1987. Bardeen, who 
went on to win a second Nobel Prize 
for his work on superconductivity, is 
professor emeritus at the University of 
Illinois. 

Posix to speak new 
language 

Recognizing the need for international 
open systems, the IEEE Computer 
Society Posix committee recently 
formed a study group to guide develop¬ 
ment of emerging open systems stan¬ 
dards. The group meets this October lb- 
20 in Brussels to present a preliminary 
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GE Superabrasives and Asahi 
Diamond Industrial Co. Ltd. of 
Tokyo joined forces to develop and 
produce diamond materials using 
chemical vapor deposition. CVD 
diamond’s thermal conductivity 
properties create a variety of 
electronics application opportunities. 

Boston’s Computer Museum 
shows SIGGraph 89 computer art 
through January 4, 1990. Displays 
include two-dimensional works, 
giant moving sculpture, interactive 
environments, and animated items. 

NIST announces a computerized 
database of commercial products suc¬ 
cessfully implementing the Open 


draft of a reference model and interna¬ 
tionalization guidelines for discussion. 
The European Commission hosts the 
meeting. 

The Posix, or Portable Operating 
System Interface for Computer Environ¬ 
ments, family of standards originated in 
response to demand for applications 
portability with the rapid evolution of 
computer technology. Posix makes user 
applications independent of the underly¬ 
ing computer hardware and system 
software. 

Non-US end users of applications cur¬ 
rently must make their systems speak 
their language and comply with their 
cultural and business conventions. For¬ 
mats of date and time, use of local lan¬ 
guage characters, and numeric formats 
typify the difficulties these users face. 

The newly formed study group plans 
to provide the framework for developing 
standardized interfaces to reduce the ef¬ 
fort and associated costs for localizing 
Posix systems. Its work will incorporate 
work already done by the Posix working 
group in Japan, the /usr/group Technical 
Committee (an international group of 
Unix-compatible system users), X/Open, 
and other industry groups. 

The initial Posix standard is both an 
ISO Draft International Standard and a 
US government Federal Information 
Processing Standard (FIPS 151). 


Systems Interconnection standards. 
The experimental Osinet lets vendors 
test each other’s products for interop¬ 
erability. Current vendors include 
DEC, Hewlett-Packard, IBM, NCR, 
Retix, Touch Communications, 
Unisys, The Wollongong Group, and 
Xerox. 

An HDTV standards working 
group plans to develop technical 
specifications and engineering 
standards for US/Canadian TV and 
film producers offering videotape for 
later broadcast or satellite trans¬ 
mission. Contact F.M. Remley, Jr., 
at (313) 763-7479 for data on the 
Society of Motion Picture and 
Television Engineers project. 
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Micro Standards 
returns 

In response to repeated requests to 
reinstate the Micro Standards column, 
Editor-in-Chief Joe Hootman recently 
announced the appointment of Carl 
Warren to IEEE Micro’s Editorial Board 
to fill this opening. 
In addition, Warren 
will begin a new 
bus department, On 
the Edge, beginning 
this December. 
Warren, a 

McDonnell-Douglas 
Space Systems 
Company senior 
scientist/senior engineer, is a senior vot¬ 
ing member of the IEEE Microcomputer 
Standards Committee governing council. 
In addition, he serves as publicity chair 
for the IEEE Standards Activities Board 
of the Computer Society. 

Previous assignments included work 
as a design engineer and as a senior edi¬ 
tor for EDN and Mini-Micro Systems 
magazines. In addition, he has authored 
numerous technical articles and books 
and served as technical advisor to the 
Buscon, Comdex, Interface, and NCC 
trade shows and the War Games movie. 
Warren, who is currently pursuing an 
MSEE degree at the University of 
California, Long Beach, holds degrees 
in industrial technology and chemistry. 


Be a winner! 

IEEE Software invites entries for the 
third annual Gordon Bell Prize until De¬ 
cember 31, 1989. Interested researchers 
should enter their applications of paral¬ 
lel processing to scientific and engineer¬ 
ing problems for consideration of two 
$1,000 awards. The competition centers 
on three categories: performance, price/ 
performance, and compiler paralleliza¬ 
tion. Entrants should submit a three- to 
four-page executive summary describing 
their work. 

Prize sponsor and former National 
Science Foundation division director 
Gordon Bell plans to announce winning 
entries in IEEE Software next spring. 

Submit entries to and request rules 
from 1989 Gordon Bell Prize, c/o IEEE 
Software, PO Box 3014, Los Alamitos, 
CA 90720-1264. 


Microprocessors 
in education 

Conference co-chairs William G. 
Vogt and Marlin H. Mickle announce 
that special sessions on microproces¬ 
sors in education will take place at 
the 21st Annual Pittsburgh Confer¬ 
ence on Modeling and Simulation 
next May 3-4. 

Vogt and Mickle will review pre¬ 
viously unpublished papers in these 
special sessions—if received by 
January 31, 1990—for possible pub¬ 
lication in a special issue of IEEE 
Micro. This issue will address the 
concerns of educators who transmit 
microprocessor information to 
students. 

Submissions should describe sig¬ 
nificant contributions that add to the 
knowledge in a particular area or de¬ 
scribe the progress of research cur¬ 
rently being conducted. 

The conference proceedings will 
carry every accepted paper. Direct 
questions about the conference or 
submission requirements to either co¬ 
chair, Modeling and Simulation 
Conference, 348 Benedum Engineer¬ 
ing Hall, University of Pittsburgh, 
Pittsburgh, PA 15261. 


Current literature 

Practical applications for neural net¬ 
works, tested architectures, and designs 
appear in the 64-page quarterly, Journal 
of Neural Network Computing. Ongoing 
columns will discuss applications in im¬ 
age analysis, visual processing, speech 
analysis/recognition, knowledge pro¬ 
cessing, control systems, robotics, bio¬ 
logical neural systems, and machine 
learning. 

Auerbach Publishers, One Penn 
Plaza, New York, NY 10019; (212) 
971-5000; fax (212) 971-5081; $145lyr. 

Accounts of the most frequently en¬ 
countered dilemmas confronting the 488 
bus user appear in Extending the IEEE 
488 Bus. The 28-page booklet also de¬ 
scribes mix and match solutions for ex¬ 
pansion, extension, and control. 



ICS Electronics Corporation, 2185 
Old Oakland Road, San Jose, CA 
95131; (408) 432-9009; fax (408) 
943-1745. 

AT&T’s Unix System Videotape 
Library is a series of videocassettes cov¬ 
ering all levels of the system. Selected 
topics include fundamentals (basic, 
intermediate, and advanced), shell com¬ 
mand language, C language, and system 
administration. 

AT&T Videotape Library, Jackson¬ 
ville, FL; (800) 554-6400 ext. 7140; 
from $450. 

/usr/group, the International Associa¬ 
tion of Unix Systems Users, states that 
its 1990 products directory to be pub¬ 
lished this December will be free to gen¬ 
eral members. Associate members pay 
$25 and the public, $50. 

lusrlgroup, 2901 Tasman Drive, Suite 
201, Santa Clara, CA 95054; (408) 
986-8840. 

The TMOS Power MOSFET Selector 
Guide from Motorola provides tables of 
linear ICs that are available to drive 
power MOSFETs. The SG56/D guide¬ 
book describes high-energy E-FETs, 
Sense FETs, and standard devices. 

Motorola Inc., Literature Distribution 
Center, PO Box 20924, Phoenix, AZ 
85063; (800) 521-6274; free. 

The 698-page Proceedings of the 
Second International Workshop on 
Signal Processing of HDTV includes 66 
papers from SMPTE, NHK, Sony, 
CCETT, MIT, BBC, Bellcore, and 
Philips. 

North-Holland, PO Box 1991, 1000 
BZ Amsterdam, The Netherlands; Dfl. 
310.00; or PO Box 882, Madison 
Square, New York, NY 10159; $163.25. 
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Editorial Calendar 


FEBRUARY 


AUGUST 

Hot Chips issue 



Details of the hottest microchips on the market today! 

Explore the architectures of National’s embedded processor. 


Communications and buses 

Sun Microsystem's Sparc 64-bit system and graphics 


Making RISCs work together; choosing the best software and 

accelerator, Intergraph's Clipper. Intel’s i486, the MIPS 


interfaces; adding PC slaves and MS-DOS, memory technol- 

RISC with floating-point unit, Texas Instrument’s floating¬ 
point chip, and more. Update your knowledge of Motorola’s 


ogy, LANs, message-passing techniques... 

88000, Weitek’s Sparc EP, and gallium-arsenide processors. 


Submit manuscripts by: 1-15-90 

Ad closing date: January 1 


Ad closing date: July 1 



APRIL 


OCTOBER 



Digital Signal Processing 

Special Far East issue 


Is it the end of the supercomputer era? Comparison of super- 

The latest technology from Japan, Korea, Taiwan, and the 


computer Linpack benchmark price/performance rankings 

Pacific Basin and current TRON Project offerings. 


with the latest 32-bit floating-point DSP chips makes serious 
analysts contemplate the future of the supercomputer 

Ad closing date: March 1 


industry. 



Submit manuscripts by: 3-15-90 



Ad closing date: September 1 



JUNE 


DECEMBER 



Special European and Near East issue 

General Interest Issue 


With 1992 just around the comer, US companies feel the 

Working with ASICs, operating systems, bioengineering, or 


pressure to meet new economic rules to compete for sales in 

other micro-related topics? We’re interested in publishing an 


Europe. What are European companies doing to meet US 

account of your efforts. 


plans for common-market penetration? 



Submit manuscripts by: 5-15-90 

Submit manuscripts by: 11-15-89 


Ad closing date: November 1 

Ad closing date: May 1 




IEEE Micro helps designers and users of microprocessor and microcomputer systems explore the latest technologies to achieve business and research objectives. 
Feature articles in IEEE Micro reflect original works relating to the design, performance, or application of microprocessors and microcomputers. 

All manuscripts are subject to a peer-review process consistent with professional-level technical publications. IEEE Micro is a bimonthly publication of the IEEE 
Computer Society. 

Advertising information: Contact Marian Tibayan, Advertising Department, IEEE Computer Society, PO Box 3014, Los Alamitos, CA 90720-1264; (714) 821 - 
8380; fax (714) 821-4010. 


*Articles may change. Please contact editor to confirm. 
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Richard Mateosian 
Hitachi America, Ltd. 
(415) 244-7456 
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Leftovers 


I n my December 1988 column I men¬ 
tioned an excellent program editor 
for the PC called Brief, by Under¬ 
ware. Unfortunately, I did not have in¬ 
formation on the publisher available 
when I prepared that column, and I have 
since been asked about it. Underware, 

84 Gainsborough Street #103W, Boston, 
MA 02115, wrote Brief. Solution 
Systems, 541 Main Street #410, South 
Weymouth, MA 02190; (800) 821-2492; 
publishes it. 

In my August 1989 column, I made 
some comments about Wingz that were 
seen by at least one reader as somewhat 
more negative than I had intended them 
to be. To clarify, 1 have a generally 
positive opinion of Wingz, but there are, 
in my opinion, a few rough edges to the 
total package. (See Letters, p. 7, this 
issue. — Ed.) 

Freshly prepared 

Born to Code in C, Herb Schildt 
(Osborne McGraw Hill, Berkeley, 

Calif., 1989, 525 pp., $28.95) 

This is a really interesting book for 
anyone interested in programming. In 
the tradition of the still unsurpassed 
Software Tools by Kernighan and 
Plauger (Addison-Wesley, 1976), the 


book offers real, useful, well-written 
programs, illuminated by informed 
commentary. In this case the author pro¬ 
vides an additional device: Interspersed 
among the pages of the nine chapters are 
14 one-page thumbnail sketches, based 
upon interviews conducted with promi¬ 
nent C programmers. 

While both the main text and the pro¬ 
grammer profiles prove interesting, they 
are basically unrelated, which weakens 
the book. It would have been better sim¬ 
ply to make the set of profiles into its 
own chapter. Then some of the contrasts 
and parallels between individual pro¬ 
grammers might have been developed. 
As it is, with the profiles isolated from 
one another and none especially deep, 
they don’t add much to the book. For 
those who keep track of such things, all 
14 of the programmers selected for in¬ 
clusion are men. 

Here are some of the insights pro¬ 
vided by these programmers: 

“Keep rewriting each chunk of code un¬ 
til it’s unabashedly elegant and readable— 
no matter what its supposed importance. 
Think about portability and efficiency 
from the start, but make no sacrifices for 
either until testing shows you must.”— 

P. J. Plauger 

“If you understand how your compiler 
generates code, you will be better able to 
write highly efficient, optimized pro¬ 
grams.”—Tom Green 

“First make it work. Then make it work 
right. Finally, make it work fast!”— 
Donald Killen 


0272-1732/89/1000-0085$01.00 © 1989 IEEE 


“Experiment. Take chances. Don’t be 
afraid to crash your machine. Push your¬ 
self to your limits of understanding. Write 
code. Rewrite code. Rewrite your re¬ 
writes.”—Jeff Betts 

“C’s greater freedom of expression 
makes it easier for a good programmer to 
write a program that is isomorphic to his 
or her conceptual solution.”—Robert 
Ward 

“Use the power of C, don’t abuse it.”— 
Robert Jervis 

“Use the language conservatively; don’t 
become enamored with the tricky things 
you can do.”—Brad Silverberg 

“If you don’t thoroughly understand the 
problem or cannot think of a clear solu¬ 
tion, no programming style or language 
can help you.”—Bjarne Stroustrup 
“Clarity and organization are the keys 
to creating great programs. Become a 
great programmer and you will get the 
most out of C.”—Ralph Ryan 

“A great program performs its task un¬ 
erringly.”—Jack Blevins 

“Great programs all share one common 
feature: they are intuitively easy to use by 
their target users.”—Richard Schwartz 
“It is a good idea to mimic modern 
software-engineering practices, such as 
modular design. Also, you should use full- 
function prototyping.”—Jesper 
Boelsmand 

“When I need to do a detailed design, I 
draw the BNF structure of it. This not 
only forces me to think through the pro¬ 
gram carefully, but gives me the outline of 
it as well.”—Thomas Plum 

“What makes C special is its suitability 
to modular programming.”—Dennis 
Saunders 
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Insights of this sort are interesting, 
but the book must stand or fall on the 
contents of the other 511 pages. Reading 
lots of code, as one of the profiled pro¬ 
grammers points out, is important to de¬ 
veloping facility in programming. This 
book gives you the opportunity to read a 
large body of real code that was written 
with readability in mind and to do so 
with the author looking over your shoul¬ 
der and pointing out the key elements 
and explaining the design decisions. 

The book begins with an interpreter 
for a subset of the ANSI C language 
called Little C. If you’ve used computer 
languages but are a little hazy on how 
they work, this sort of material can be 
very helpful. A simple interpreter is 
much easier to understand than a simple 
compiler, and once you have it working, 
it is much easier to experiment with. An 
understanding of the workings of an 
interpreter for a language can greatly 
enhance your understanding of how to 
use the language. In the absence of this 
sort of concrete connection between ex¬ 
pression and action, it is easy to form 
misconceptions about how certain lan¬ 
guage features work or what they mean. 

The next chapter deals with icon- 
based interfaces. Many people use vis¬ 
ual operating system interfaces, like that 
of the Macintosh computers. These 
interfaces are easy to use but hard to 
understand, and books like Inside 
Macintosh don’t help much. The ma¬ 
terial that Schildt presents here is pretty 
simple, but it does give readers an idea 
of how these interfaces work. You need 
an IBM PC with some sort of graphics 
adapter to run the code. 

Another issue deals with terminate- 
and-stay-resident functions, again in the 
PC-DOS environment. TSR functions, 
as their name implies, remain in mem¬ 
ory ready to run while other programs 
are being used. Usually a special key¬ 
stroke, called a hot key, activates the 
TSR, which then performs its intended 
task and returns control to the running 
program. DOS contains features, not all 
of which have been documented, to 
support the implementation of TSR 
functions. Schildt explains how TSR 
functions work, documents the key un¬ 
documented support features in DOS, 
and develops a substantial TSR 
application. 

The first three chapters are really the 
most helpful ones, since the subject mat¬ 
ter is both valuable and not commonly 
available. The next three chapters also 
present useful, but more mundane pro¬ 


grams: a multitasking kernel, a screen 
editor, and a database system. In the fi¬ 
nal three chapters, Schildt returns to the 
offbeat, dealing with custom fonts, ani¬ 
mation, mouse interfacing, and printer 
control. 

Something ought to be said here about 
Schildt’s programming style, since this 
is, after all, a book meant to teach the 
art of programming in C. Unfortunately, 
the fact is that if you are looking for 
interesting algorithms, cleverly concise 
code, or any other sorts of highlights, 
you’ll be disappointed. Like most real 
programs, this code contains nothing to 
be skimmed. It is competent and neatly 
formatted. It follows the usual practice 
of using lowercase characters exclu¬ 
sively, except for defined constants, 
which are all uppercase. Comments are 
sometimes, but not always, complete 
sentences, beginning with an initial 
capital letter and ending with a period. 
Indentation and bracketing follow a con¬ 
sistent and widely used convention. 
White space is consciously used for 
clarity, but sometimes the code gets a 
little dense. 

In summary, I like this book, and if 
you are interested in programming, I 
think that you’ll like it too. 


The NeXT Book , Bruce F. Webster 
(Addison-Wesley, Reading, Mass., 

1989, 416 pp., $22.95) 

This is not your usual “Gee whiz” 
book about a new gizmo. The Next com¬ 
puter system is an excellent piece of 
work, and this book does give a good 
overview of it. As early reviewers often 
do, Webster has worked closely with 
Next. He has been given access to 
equipment and information that were not 
readily available, so that he is under¬ 
standably grateful to his benefactors. He 
does sometimes gush embarrassingly 
about them. Nonetheless, this book is 
not hype. It is technically competent and 
informative. 

I spent some time playing with a Next 
computer system shortly before reading 
this book, so I was able to read it with a 
concrete picture in mind. The book 
clarified many things that had puzzled 
me. It would certainly be a useful piece 
of documentation to be shipped with 
Next systems. 

The Next computer system is housed 
in a 1-foot cubic black box, which con¬ 
tains four 96-pin Euro DIN connector 
slots connected by a modified Nubus. 
One of these is devoted to the processor 


It would certainly be 
a useful piece of 
documentation to be 
shipped with Next 
systems. 


board, and the others are available for 
other uses. Next provides a bus interface 
chip for developers of plug-in cards. 

The box contains a 3.5-inch optical disk 
drive and can accommodate another 
internal 3.5-inch drive—either a hard 
drive (SCSI) or another optical drive. 
The processor board contains a 68030 
CPU, a 68882 FPU, a 56001 DSP, 8 to 
16 Mbytes of dynamic RAM, some 
video RAM and static RAM, and a cus¬ 
tom 12-channel DMA processor. The 
display, including keyboard, mouse, and 
sound equipment, forms a separate unit. 
Another unit contains the Next laser 
printer, a bare-bones device that uses a 
Canon engine and relies upon the 68030 
on the processor board for the intelli¬ 
gence that is usually built into other la¬ 
ser printers. 

On top of this standard set of hard¬ 
ware sits the Unix-like Mach operating 
system and a set of tools for the user 
interface. In addition, the Write Now 
word processor, the Mathematica data¬ 
base server, a dictionary, a book of 
quotations, and the complete works of 
Shakespeare are bundled with the 
system. 

Webster’s book methodically de¬ 
scribes all of these elements in a top- 
down manner. Early chapters seem like 
baby talk, but precise and accurate baby 
talk that indicates that the author knows 
more than he is saying. And in due time 
further layers of detail are revealed. By 
the end of the book, most sophisticated 
readers will feel quite comfortable 
working in the Next environment with¬ 
out further documentation. What more 
can you ask? 
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New Products 


Marlin H. Mickle 
University of Pittsburgh 

Send announcements of new microcomputer and microprocessor 
products, and products for review, to Managing Editor, IEEE Micro, 
PO Box 3014, Los Alamitos, CA 90720-1264. 


Refreshing your memory 


One million characters in a 
module 

Two single in-line memory modules 
(SIMMs) that come in 3.5-inch-long, 

1-Mbyte configurations allow users to 
upgrade system memory without 
changing PCBs. The company estimates 
that a 1-Mbyte SIMM can store one 
million alphanumeric characters. 

The 8-bit KMM581000 consists of 
eight KM41C1000 DRAMs in 20-pin 
SOJ packages. The KMM591000 con¬ 
tains an additional DRAM for parity 
checking. Both SIMMs come with 80-ns 
to 100-ns row-access times, 8-ms re¬ 
fresh times, and 30-pin dual-in-line 
packaging. Samsung Semiconductor; 
from $185.95 (100s). 

100-ns 581000 

Reader Service Number 10 
80-ns 581000 
Reader Service Number 11 
100-ns 591000 
Reader Service Number 12 
80-ns 591000 
Reader Service Number 13 


Sleight-of-hand controllers 

Two VLSI DRAM controllers update 
slower DRAMs by making them appear 
faster to the system. The KS84C21 and 
the KS84C22 also interface between 
system-memory array and the micropro¬ 
cessor to eliminate a number of logic 
devices. Interleaving provides access to 
DRAM array without fast-page, static- 


SRAMs reach 20-ns speeds 

The MCM6208 and MCM6209 64- 
Kbyte x 4-bit fast SRAMs feature 1.0- 
micrometer HCMOS technology and 20- 
ns access times. A sister device, the 
256-Kbyte x 1-bit MCM6207, also oper¬ 
ates at 20 ns. 

The company also plans 1.0-micro- 
meter versions of its 16-Kbyte x 4-bit 
MCM6288 and MCM6290 devices, 
as well as the 64-Kbyte x 1-bit 
MCM6287—all with 15-ns access times. 

Upgraded MCM4180, MCM62350, 
and MCM62351 application-specific 
memory devices of 16-Kbyte density 
now feature 20-ns access times. 
Motorola. 
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column, or middle-mode DRAMs. 

The KS84C21 and KS84C22 support 
256-Kbyte and 1-Mbit DRAMS; the 
KS84C22 also supports 4-Mbit devices. 

Samsung. 
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Packaging accents density 



Three AK59256 dynamic RAM 
modules come mounted on printed 
wiring boards in SIMM or SIP 
configurations. 


According to the company, 
AK59256 DRAM modules have six 
times the density as those packaged 
in DIP form. The modules contain 
two 256-Kbyte x 4-bit CMOS 
DRAMs in SOJ packaging and one 
256-Kbyte x 1-bit DRAM in PLCC 
packaging. 

In the lower 8 bits of this 9-bit- 
word module organization, each 
tristate-data I/O pin connnects to two 
256-Kbyte x 4-bit DRAM bidirec¬ 
tional data pins. The separately con¬ 
trolled ninth bit can provide parity. 
Accutek (pricing varies in quanti¬ 
ties from 5 to 100,000 units). 
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Big VRAM fires graphics 

The CMOS, 1-Mbit Video RAM 
features access times of 80 ns and 
flash-write, fast-page-mode, and 
write-per-bit operations. The 
M5M482128 configuration combines 
a 128-Kbyte x 8-bit dynamic RAM 
parallel-access, memory-array port 
with a 256-Kbyte x 8-bit, serial- 
access-memory port. 

The M5M442256 version com¬ 
bines a 256-Kbyte x 4-bit DRAM 
port with a 512-Kbyte x 4-bit SAM 
port. 

The M5M482128 features a bidi¬ 
rectional SAM port with split read/ 
write capabilities, block write, and 
logical operations. VRAM applica¬ 
tions include laser printers and 
imaging displays for PCs and work¬ 
stations. 

The M5M482128 comes in 40-pin, 
small-outline, J-lead (SOJ) packag¬ 
ing, while the M5M442256 comes in 
either 28-pin SOJ or 28-pin zig-zag, 
in-line (ZIP) packages. Mitsubishi 
Electronics America; $62 
(M5M482128) (100s); $57.50 
(M5M442256) (100s). 

482128 
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SRAMs expand the bandwidth 

The 1-Kbyte x 8-bit IDT7050 and 
the 2-Kbyte x 8-bit IDT7052 Four- 
port SRAMs allow simultaneous, 
asynchronous access of memory lo¬ 
cations from any one of four ports to 
improve system speed. Interfacing 
four IDT7052s with 32-bit pro¬ 
cessors or DMA devices results in an 
effective bus bandwidth of 640 
Mbytes/s. 

The expandable, 25-ns SRAMs 
come in 108-pin plastic and ceramic 
PGA packaging and 132-pin plastic 
and ceramic quad flat packs. Appli¬ 
cations include multiprocessing and 
DSP. Integrated Device Technol¬ 
ogy; $212.50 (IDT7050, 108-pin ce¬ 
ramic PGAs) (100s); $297.50 
(IDT7052,108-pin ceramic PGAs). 
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SRAMs feature TTL interfaces 

Three SRAMs with transistor-tran- 
sistor-logic I/O interfaces provide 
12-ns address access times and 16- 
Kbit densities. The SSM6116, 
SSM6168, and SSM6170, fabricated 
with proprietary BiCMOS process 
technology, provide conventional 
pin-out and function compatibilities. 

The SSM6116 offers 2-Kbyte x 8- 
bit organization with common data 
I/O functions. The SSM6168 and 
SSM6170 provide 4-Kbyte x 4-bit 
organization. The SSM6170 also of¬ 
fers an output-enable function to con¬ 
trol the tristated bus. The SRAMs 
come in a variety of packaging. 
Saratoga Semiconductor; from 
$21.70 (SSM6116); from $20.80 
(SSM6168); from $20.80 
(SSM6170); (all 1,000s). 


SSM6116 
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VRAM boasts quick access 
times 



The V53C261 CMOS video RAM 


comes in 24-pin, plastic zig-zag, in 
line packaging. 


The V53C261 VRAM clocks 256 
4-bit words at a serial rate of 33 MHz 
with an address access-time of 80 ns. 
Real-time read-transfer operations 
can occur during row-address 
changes. Other features of the 64- 
Kbyte x 4-bit device include a bit- 
mask write function, fast-page-mode 
access, RAS-only and CAS-before- 
RAS refresh, and bidirectional data 
transfer between serial and DRAM 
ports. Vitelic; $14.63 (100s). 
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ASIC design 

Integrated system aids ASIC 
design 

The Fastrack design-automation 
tool in concert with various cell li¬ 
braries provides IC design for com¬ 
mercial and military signal-process¬ 
ing, control, and power applications. 

The digital version offers timing 
and area optimization, hierarchical 
schematic capture, logic and fault 
simulation, static-timing analysis, 
and standard-cell placing and rout¬ 
ing. 

The bipolar analog version offers a 
statistical process database, menu- 
driven interfaces, hierarchical sche¬ 
matic capture, coupled electrical and 
physical-design programs, and lay- 
out-versus-schematic checking. 

The toolset allows automatic back- 
annotation of layout modifications 
and parasitics for resimulation. 
Harris Semiconductor; from 
$20,000 (digital, front end); 

$60,000 (analog, software only). 

Digital 
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Analog 
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Supercells come to the PC 

Zyp-AT Design Tools incorporate 
Supercell VLSI peripherals as well as 
standard cells for schematic capture, 
netlist conversion, simulation, and 
test-program development. Super¬ 
cells contain a pretested functioning 
lock to reduce development time and 
increase chip complexity on IBM PC 
ATs. 

Consisting of timers, peripheral 
interfaces, interrupt controllers, and 
clock generators, the Supercell li¬ 
brary requires a 40-Mbyte hard disk, 
one parallel or two serial ports, and 1 
Mbyte of extended memory per 
1,000 gates for simulation opera¬ 
tions. Zyp-AT tools run with Micro¬ 
soft Windows or DOS with DOS ex¬ 
tenders. Zymos Corporation. 
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Tool boosts design 

The Logic Assistant design- 
automation tool integrates design 
planning, capture, testability, and 
verification functions into one 
environment. Designers can proceed 
from initial specification through 
partitioning and estimation analysis 
to high-level functions with graphic 
feedback for critical path analysis 
and fault grading. 

Designers can also gain feedback 
on trade-offs between gate array and 
cell-based designs, power consump¬ 
tion, chip size, and package selec¬ 
tion. The Logic Assistant is a com¬ 
ponent of the V8 ASIC Design 
Platform. VLSI Technology; 

$15,000 (option in nonstandard 
configurations); no cost (in some 
standard configurations). 

Reader Service Number 38 

CAD ASIC tools run on Sun 

LSI Logic has developed two de¬ 
sign tools that run on all-ASIC Sun 
Sparcstation 1 and 4/330 platforms. 

Silicon Builder automatically pro¬ 
duces physical block layouts at the 
chip level. 

Design Builder—used with the 
Silicon Integrator toolset—estimates 
chip partitioning and die size and 
automatically compiles memory 
cells. The package also synthesizes 
finite state and register-transfer 
machines. LSI Logic. 
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Interface enhances VHDL 

The VHDL interface allows design¬ 
ers to specify designs in the VHSIC 
hardware description language at be¬ 
havioral, dataflow, and structural lev¬ 
els for simulation on Ikos 2300/2900 
systems. The interface consists of a 
VHDL analyzer and compiler and the 


Mix-and-match design tools 



The OACS environment supports 
this bipolar, semicustom circuit as¬ 
sembled in tape-automated bond¬ 
ing, PGA packaging. 

The Open Architecture CAD Sys¬ 
tem hosts third-party design tools on 
Apollo and Sun workstations. Ven¬ 
dors of accommodated tools include 
Mentor Graphics, Valid Logic Sys¬ 
tems, Cadence Design Systems, Syn- 
opsys. and Gateway Design Automa¬ 
tion. Users can customize designs 
with such functions as logic synthe¬ 
sis, multichip/multilevel logic simu¬ 
lation, and digital timing analysis. 
The Electronic Data Interchange For¬ 
mat environment supports both 
CMOS and bipolar ASIC 
technologies. Motorola; about 
$10,000 (license per node, low-end 
configurations); about $100,000 
(fully configured system licenses). 

(Low-end) 
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C++ modeling language. Dataflow 
and structural descriptions translate 
into simulated gate-level netlists. 

Ikos Systems. 

Reader Service Number 43 


Have data, will travel 

The Traveldisk portable disk drive 
stores 10 to 200 Mbytes of data and 
plugs into IBM PC XT/ATs, 80386s, 
laptops, and Macintosh computers. The 
company claims ruggedness for the 3 x 
5 x 7-inch steel-case subsystem, which 


weighs 3.5 pounds. Bus-extender cards, 
laptop connectors, or Mac SCSI cables 
complete the package. Tradewinds 
Peripherals. 
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Take a RISC on critical timing 

The TC2000 RISC for time-critical 
applications boasts a multiprocessing 
architecture that expands from eight 
to 504 processors. Function cards 
that include the 88000 microproces¬ 
sor, 32 Mbytes of on-board DRAM, 
and optional I/O capabilities fit on 
one circuit board. The company re¬ 
ports performance rates of 19 
Dhrystone MIPS, 13 million Whet¬ 
stones/s, and 20 Mflops. 

The system supports two operating 
systems at the same time. While 
some processors run the Psos+m 
real-time executive, others can use 
the Nx Unix-based 4.3 BSD system. 
BBN Advanced Computers; 
$350,000 (base model with graphi¬ 
cal development and debugging 
tools). 

Reader Service Number 45 

Package bridges DEC and IBM 
systems 

Ezbridge connectivity software for 
the DEC RISC Ultrix family of sys¬ 
tems accesses CAD/CAM packages 
on IBM mainframe/midrange sys¬ 
tems and X.25 networks. Plug-and- 
play components residing on DEC- 
supported communications boards 
transfer large binary graphics files to 
and from workstations at high 
speeds. Users can also build peer-to- 
peer applications to integrate data 
from both IBM and DEC systems 
using the LU6.2 protocol. 

Systems Strategies. 
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Vrtx32 for the 80960 

The Vrtx32/960 operating system/ 
multitasking executive for real-time 
embedded applications supports In¬ 
tel's 80960 RISC. The kernel fea¬ 
tures deterministic, fixed-cost system 
calls independent of the number of 
tasks, queues, or other system ob¬ 
jects. Designers can program applica¬ 
tions on Sun, PC, or VAX hosts. 

The company plans product availa¬ 
bility in December 1989. Ready 
Systems; from $4,810 (Vrtx32/960, 
individual license); $7,800 (debug¬ 
ger/monitor). 

Vrtx32 
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RISC and RISC-related products 


Coprocessor for the Mac II 



The Nusuper coprocessor features a 32-bit Am29000 RISC CPU, a 50- 
MHz system clock, and a peak performance of 25 VAX-equivalent MIPS. 


The Nusuper system provides 3 
Mbytes of cache plus direct access to 
32 Mbytes of DRAM on the Nubus. 
Macintosh, PC, Sun, or VAX hosts 
can compile code to run on Nusuper 
under the Macintosh operating sys¬ 
tem. The company supplies the Meta¬ 
ware C compiler and Macintosh OS 

Unix, 80-MIPS RISC runs with 
graphics 

The Tp881 V VME-based multi¬ 
processor configuration—combined 
with the company’s Tp-Ix Unix oper¬ 
ating system—allows multiple users 
to simultaneously access the system 
kernel. 

The Tp-AGCV graphics controller 
provides a 2,048 x 2,048 x 8-frame 
buffer with a graphics processor and 
supports up to 32 hardware windows. 

Tadpole Technology. 
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DEC supports RDMS 

The Ingres relational database- 
management system and 4GL tools 
now run on Ultrix-based Decsystem 
5400 and 5800 computers and Dec- 
station 2100 workstations. A multi¬ 
server, ANSI-compliant architecture 
provides portability of applications 
and transparent data access across 
divergent systems. Relational 
Technology; from $5,000 (single- 
user configuration). 
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runtime interface, as well as the as¬ 
sembler, linker, and debugger. A bus 
master can directly access graphics 
and other I/O cards. Yarc Systems; 
$3,995 (basic system). 
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Faster 88000s abound 

Motorola has announced volume 
production of 88000 RISCs that per¬ 
form at speeds up to 25 MHz at 21 
MIPS. The company reports bench¬ 
mark performance measurements of 
48,387 Dhrystones and 25,000 Whet¬ 
stones per second. Motorola; $697 
(25-MHz 88100); $875 (25-MHz 
88200). 

88100 
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CAE meets Ultrix 

The Workview Series II CAE soft¬ 
ware for ASIC and analog design 
runs on RISC Decstations 3100 in 
the Ultrix operating-system environ¬ 
ment and on Unix-based DEC and 
Sun workstations. The package pro¬ 
vides a framework for integrating 
proprietary and third-party CAE/ 
CAD tools into one system. 

The toolset employs Decwindows 
interfacing, multitasking, and 
networking. Viewlogic; from 
$ 10 , 000 . 
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Convert your PC into a RISC 

The Risclink transputer-based 
enhancement board for the IBM PS/2 
provides parallel processing and 
RISC functioning over the Micro 
Channel bus. 

The 32-bit accelerator connects 
clock, communications, and power 
between on-card transputer modules 
and the PC. 

T414, T425, and T800 32-bit mod¬ 
ules feature 10 MIPS and 4 Mbytes 
of RAM with 25-MHz clock rates. 
The T800 also provides 64-bit, float¬ 
ing-point support. 

A starter kit consists of one card 
and a 1-Mbyte T800 module, an as¬ 
sembler, and the C programming 
language. Levco; $799 (Risclink); 
$1,495 (kit). 
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Classified Ad 


RATES: $5.00 per line, $50 minimum 
charge (up to ten lines). Average five 
typeset words per line, eight lines per 
column inch. Add $10 for box number. 
Send copy at least one month prior to 
publication to: Heidi Rex or Marian 
Tibayan, Classified Advertising, 
IEEE Micro , 10662 Los Vaqueros 
Circle, PO Box 3014, Los Alamitos, 
CA 90720-1264; (714) 821-8380; fax 
(714) 821-4010. 


SENIOR PRODUCT 
DESIGN ENGINEER 

Manufacturer of industrial equipment in 
Greensboro, NC has immediate need for 
Software/Hardware Development Engi¬ 
neer. Responsibilities include product de¬ 
sign and development of multi-processor, 
real time operating system for state-of-the- 
art, highly advanced machinery. Assembly 
languages 8086/286/386 and C required. 
BSEE, 3-7 yrs. exp. with industrial back¬ 
ground pref. Competitive salary, attractive 
benefits and clear growth opportunity 
being offered. 

Resumes to: P.O. Box 93 

Landmark Sq. 

Stamford, CT 06901 
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design 


PCB 

Roll-your-own commands 

The DC/CAD enhanced PCB 
design package allows users to cre¬ 
ate their own commands with a 
built-in programming language. 
Language constructs include If/ 

Endif, While/Endwhile, Exit, Wait, 
and Getkey. 

Improved Gerber output enhances 
photo-plotting capabilities. The 
package, which now supports 96 
apertures, features a command his¬ 
tory that allows users to employ 
one-keystroke, repeat commands. 

A digitizer interface supports 
Autodesk’s ADI interface for most 
commonly used tablets. Design 
Computation; from $495 (DC/ 
CAD-level-2 users eligible for free 
updates within one year of pur¬ 
chase). 
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Tool analyzes signal noise 

The Allegro PCB Engineering 
System now includes a set of inte¬ 
grated signal-noise-analysis tools. 

Designers can analyze board de¬ 
signs by estimating the delay and dis¬ 
tortion characteristics of high-speed 
signals. 

The system also provides thermal 
and reliability analysis and calcula¬ 
tion of wire delays and parasitics. 

Other board analyses include 
reflections, crosstalk, and ohmic loss. 

Design engineers and PCB 
designers can interactively view 
analysis results, identify electrical 
errors, implement design modi¬ 
fications, and evaluate the effect of 
changes. Valid Logic Systems; from 
$7,500 (depending upon software 
configuration). 
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From micros to mainframes 
and back again 

Workstation- and mainframe-based 
PCB design tools can bilaterally 
trade data directly with P-CAD 
Master Designer II and Associate 
Designer PCB design packages for 
the PC. 

Users can start a PCB design using 
P-CAD on the PC, place components 
and route traces on workstation Pro¬ 
fessional Cadam Prance software, 
and integrate finished designs with a 
packaging system designed on the 
mainframe-based Cadam Interactive 
Design package. 

Design data transfers between 
applications can take place elec¬ 
tronically. Cadam, Inc. 
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Library supports surface- 
mount technology 

The Tango-PCB/SMT Plus pattern 
library, based on James 
Blankenhorn’s SMT Land Pattern 
Handbook , includes 116 vendor- 
specific patterns such as quad flat 
pack, inductor, and Mil Spec. 

Many of the International SMT 
Association-endorsed patterns have 
been refined and tested under 
volume-production conditions. 

Designers can add additional pat¬ 
terns to existing Tango SMT librar¬ 
ies. Accel Technologies; $149 (in¬ 
cludes handbook and library on 
disk). 

Reader Service Number 62 


Bureau provides CAD routing service 


Automated Systems, Inc. claims 
overnight completion of 24 x 24-inch 
boards with 1,500 components. ASI 
design service bureaus—equipped 
with a Prance design system running 
on an IBM-4300 series mainframe 
and a Prance GT on the Decstation 
3100 workstation—can interface to a 
number of systems. 

The bureau accepts an ASCII- 


placed netlist via magnetic tape, 
floppy disks, or data over dial-up 
telephone lines. Designers can 
preroute critical signals that require 
specific routing geometries. 

ASI can generate artwork and fab¬ 
ricate prototypes or supply specified 
tooling. Automated Systems, Inc. 
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Chip set supports workstations 

The Supernet Fiber Distributed Data 
Interface (FDDI) add-in board supports 
100-Mbps data rates and allows connec¬ 
tion of 500 network stations. 

The VLSI five-chip set addresses 
physical protocol and media-access con¬ 
trol FDDI LAN standards. (The physical 
media-dependent standard defines opti¬ 
cal data links, cables, and connectors.) 
Users can address the station-manage¬ 
ment function standard in software. 
Advanced Micro Devices. 
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Mainframe math on MS-DOS 

The PC Macsyma symbolic and nu¬ 
merical package runs on 80386-based 
PCs using the MS-DOS operating sys¬ 
tem. Featuring the same capabilities as 
the workstation and mainframe versions, 
the package solves differential and inte¬ 
gral equations, computes Laplace and 
Fourier transforms, and automates vec¬ 
tor and tensor calculus. Symbolics; 
$2,900 each; $510 each (10 copies, cer¬ 
tain academic institutions). 

Commercial 
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Transfer tape from mainframes to 

PCs 

The nine-track Reformatter conver¬ 
sion system contains a 0.5-inch tape 
drive, controller card, and software for 
conversion, tape backup, or restoration. 
Controllers for IBM PCs and PS/2s sup¬ 
port file/data/program transfers between 
mainframes and PCs. The software runs 
with MS-DOS, Xenix, and Unix operat¬ 
ing systems. Microtech Conversion 
Systems; from $3,000. 
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Manufacturer Model 


80X86, i486-related machines 


Mylex Corp. 


Entourage 

Computer 

Corp. 

IBM 


Systems 

Integration 

Associates 


Language 
Processors, Inc. 


Systems 

Integration 

Associates 


US Integrated 
Technologies 


MWS386-25 
system board 


316XMC, 
320MC PCs 


PS/2 486/25 
power plat¬ 
form 


SIA 486 com¬ 
puter series 


Fortran 

compiler 


SIA 386/33 
computer 


OEM-2LP 

motherboard 


Comments 


R.S. No. 


Intel 80386-based board operates at 25 MHz and supports DOS, Unix, 80 

and OS/2 operating systems. Zero wait states result from 64 Kbytes of 
25-ns SRAM and 80-ns SIMM DRAM chips. An AT-standard bus and 
eight memory slots support LANs, file servers, workstations, and CAD/ 

CAM systems. From under $1,300. 

PS/2-compatible PCs feature the Micro Channel architecture and 3.8- 81 

inch profiles. The Intel 80396SX-based, 16-MHz 316XMC and the 
80386-based, 20-MHz 320 MC come with 1 Mbyte of RAM and a 40- 
Mbyte, SCSI hard disk. $2,695 (316XMC); $3, 895 (320MC). 

Upgrade platform for the PS/2 Model 70-A21 replaces the 80386 board 82 
with the 25-MHz, 1.2-million transistor i486 and retains compatibility 
with IBM PC, OS/2, and PS/2 operating systems or 386 software. 

RISC design, an 8-Kbyte, internal cache memory, and an on-chip math 
coprocessor complete the package. $3,995. 

Intel i486-based computer family features 64 Kbytes of static cache 83 

memory plus the i486's 8-Kbyte, built-in cache. AT- and EISA-bus 
versions both include a 4-Mbyte main memory, a 150-Mbyte main disk, 
a 1.2-Mbyte floppy, and serial or parallel ports. The SIA 486/25 and 
/25E turn in 25-MHz performances; the SIA 486/33 and 33/E clock at 
33 MHz. 

Compiler for the Sun 386i workstation running Sun OS Version 4.0 84 

implements ANSI X3.9-1978 Standard and Mil-Std-1753 and complies 
with X Open. VAX Fortran and Fortan-66 extensions help import appli¬ 
cations. Enhancements to Fortran-77 include Do While and End Do 
statements, an Include directive, extended Integer and Logical data 
types, and octal and hexadecimal contants. Also includes Codewatch 
debugging tool to test source-level programs. $995. 

Zero-wait-state, 64-Kbyte cache-memory computer runs at 32.64 MHz 85 
according to Byte Lab five-test suite. (Scoring is equivalent to an IBM 
AT at 1.0.) A 4-Mbyte disk cache supports database and network 
applications. 

Available in 16-, 20-, and 25-MHz versions, 80286-based AT PC 86 

incorporates VGA color graphics, a floppy disk controller and hard-disk 
interface, one parallel and two serial ports, and up to 4 Mbytes of 
DRAM. The surface-mount-technology system also uses Phoenix BIOS. 
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Manufacturer 


Model Comments 


R.S. No. 


Xycom, Inc. 


Micro Express 


Winsystems, Inc. 


Fortron/Source 


AST Research 


XVME-683 IBM PC AT-based computer packaged on two VMEbus cards features 87 

processor 1 or 4 Mbytes of dual-ported memory, SCSI and floppy disk control- 

module lers, and a color graphics controller. Users can chose optional 80387 or 

Weitek math coprocessors. PC AT bus allows direct communcations be¬ 
tween cards without using the VMEbus. A battery-backed clock and 
watchdog timer support industrial applications. 


ME 286- 
12SL and ME 
286-16SL 
PCs 


Two 80286-based PCs in 16.5 x 4 x 15.25-inch cases include a mono¬ 
chrome graphics card and 14-inch flat-screen monitor. Also provided 
are 640 Kbytes of RAM, a choice of 5.25- or 3.5-inch floppy disk 
drives, and serial and parallel ports. Other features include five expan¬ 
sion slots and a socket for an 80287 coprocessor. $799 (12-MHz 286- 
12SL); $1,199 (286-16SL). 


88 


MM-286AT- STD-compatible board runs at 20 MHz and supports OS/2, MS-DOS, 89 

20-2M Xenix, and PC-DOS operating systems. Users can configure the 

board 2-Mbyte board with up to 2 Mbytes of parity DRAM, 128 Kbytes of 

EPROM, and two DMA or 15 interrupt channels. From $1,995. 


386 Netset Zero-wait-state, 33-MHz system for CAD/CAM/CAE, file serving, and 90 
333 computer and multiuser applications comes with 1 Mbyte of memory and a 64- 
Kbyte cache of 15-ns SRAM. Also includes choice of 5.25- or 3.5-inch 
floppy disk drives, one parallel or two serial ports, and an ESDI con¬ 
troller. $4,900 (not including monitor and hard drive). 


486/25 and 

386/33 

Fastboards 


Industry Standard Architecture-based upgrades for 25-and 33-MHz 91 

Premium 386 desktops offer i486 or 386 CPUs. According to the com¬ 
pany, the 33-MHz 386 provides a 33-percent performance gain for the 
25-MHz Premium 386. $2,995 (486/25 upgrade for the 386/33); $3,695 
(486/25 upgrade for the 386/25); $2,395 (386/33 upgrade for the 386/ 

25). 


Micro View 

continued from p. 96 

Microcode, in part, disguises the 
nitty-gritty hardware details of the im¬ 
plementation and maps them onto a 
standardized instruction set. Random 
logic, PLAs, and state machines can also 
disguise details. But microcode allows 
the greatest capability for producing 
“machine-level” instructions that are 
not closely related to native hardware 
functions. 

Cache memory, while primarily an 
implementation feature rather than an 
architectural one, does have some char¬ 
acteristics that can be visible to soft¬ 
ware. Ideally, the existence of a cache 
should be transparent to the software. If 
data or instructions reside in the cache, 
the processor can retrieve them more 
quickly, but the function remains the 


same. However, cache memory has 
some subtle architectural implications. 
Some memory locations (such as 
memory-mapped I/O) must not be 
cached. Most systems provide some way 
for software to control which address 
ranges are cached and which are not, as 
well as some mechanism to globally en¬ 
able or disable the cache. 

In general, only operating system 
software uses these functions. Thus, 
they represent a separate, and less criti¬ 
cal, level of architectural compatibility. 
Processor designers can change func¬ 
tions that are used exclusively by sys¬ 
tem software without affecting applica¬ 
tion programs. Thus they worry less 
about changing these functions than 
about general architectural characteris¬ 


tics that affect all software. 

As with cache memory, only operat¬ 
ing system software typically uses mem¬ 
ory management functions. However, 
unlike the cache control functions, the 
interactions between system software 
and memory management hardware are 
complex. Furthermore, if designers ex¬ 
pose the memory management model to 
user-level software, changing the model 
can have significant effects on applica¬ 
tions programs. The major offender in 
this regard is Intel’s 80X86 architecture. 
The 8086 uses base registers, which are 
accessible by user programs, as part of 
the memory management scheme. The 
application programmer must explicitly 
deal with the resulting segmented mem¬ 
ory space. The segment registers func- 


94 IEEE MICRO 






tion differently on a 286 (in protected 
mode) than on an 8086. So 8086 appli¬ 
cation software is not upward compat¬ 
ible to the 286 unless the 286 runs in 
“real mode,” effectively disabling all 
memory management functions. 

Thus, while memory management 
functions do not strictly form part of 
the basic processor architecture, chang¬ 
ing them can be painful for software 
designers. 

Sparc typifies an architecture that was 
initially defined without a memory man¬ 
agement unit. Sun has its own MMU de¬ 
sign, which it has patented. It is not 
TLB-based (translation look-aside 
buffer) and requires a considerable 
amount of static RAM and discrete 
logic. (Sun makes the patent available 
for licensing, but no chips yet imple¬ 
ment Sun’s MMU scheme.) Seeing the 
need for a standard. Sun eventually in¬ 
troduced its “reference” MMU specifi¬ 
cation for VLSI (very large scale inte¬ 
gration) implementation. Sun’s latest 
workstations use a proprietary ASIC 
(application-specific IC) implementation 
of Sun’s original MMU scheme, how¬ 
ever. Future Sun clones will likely use 
the “reference” MMU chips that are be¬ 
coming available. 

As a result, while Sparc application 
programs should enjoy binary compati¬ 
bility across a variety of hardware im¬ 
plementations, operating-system code 
will have to be changed for each MMU. 
Sun heavily modified its current version 
of Sun OS to isolate the MMU-depend- 
ent code and make it easier to change 
the MMU model. In earlier Unix ver¬ 
sions, changing the MMU was a large 
project, affecting many parts of the 
system software. 

Architectural issues 

The truly architectural issues center 
on the instruction set and the register 
file. While we see many differences 
among instruction sets, within the range 
of architectures we consider here (32-bit 
microprocessors), few differences as¬ 
sume significance other than the distinc¬ 
tion between RISC (reduced instruction- 
set computing) and CISC (complex 
instruction-set computing) styles. To be 
sure, the Motorola 68020 instruction set 
differs from the 80386 instruction set, 
but these differences are relatively un¬ 
important. Similarly, the instruction sets 
of Sparc, MIPS, Motorola 88000, and 
Am29000 processors do not differ 
much. Each has some weak points, such 


as Sparc’s lack of an integer multiply in¬ 
struction (rather than just a multiply 
step). But these differences rarely be¬ 
come deciding factors in selecting a 
processor. 

I do not mean to say that architectural 
issues are unimportant. Countless pages 
have been written—and will be writ¬ 
ten—on the advantages and disadvan¬ 
tages of various architectural features. 
Given two equally good implementa¬ 
tions, architecture is certainly important 
in making a selection. 

The architecture also affects the im¬ 
plementation possibilities. Indeed, this 
is the key reason for RISC architectures, 
and makes them easier to build than 
CISC designs. A minimum Sparc CPU 
requires less than 20,000 gates (about 
80,000 transistors). It achieves perform¬ 
ance well in excess of that achieved by 
the Intel 386 with about 300,000 transis¬ 
tors (although the 386 includes an 
MMU, while the Sparc chip does not). 

As a result, we will see Sparc imple¬ 
mentations in gallium arsenide long be¬ 
fore we see GaAs 386 implementations. 

Two key differences separate RISC 
and CISC architectures. The differences 
have nothing to do with “reduced” in¬ 
struction sets versus “complex” instruc¬ 
tion sets, but nonetheless we commonly 
associate them with these categories. 
Because most CISC architectures were 
defined when the number of transistors 
per chip was relatively small, architects 
made some trade-offs that are no longer 
optimal. 

First, most CISC architectures offer 
fairly small register sets—usually no 
more than 16 registers, and often no 
more than eight. All RISC architectures 
include 32 or more registers, which sig¬ 
nificantly reduces the number of mem¬ 
ory load and store operations that must 
be performed. No straightforward way 
exists to extend the register set of CISC 
architectures while retaining binary 
compatibility. 

Second, most CISC architectures use 
an accumulator-based or two-operand 
instruction format, which destroys one 
of the operands by overwriting it with 
the result of the calculation. This ap¬ 
proach requires additional instructions 
to copy the operand to another register 
if it is needed after the calculation. 

These two characteristics give today’s 
RISC architectures an advantage, but a 
new CISC architecture could use them 
as well. Nevertheless, the lack of these 
features is a permanent disadvantage for 
today’s CISC architectures. 


Implications 

Why bother with these fine distinc¬ 
tions? They become important because 
all future compatible chips will be stuck 
with the architecture, while implementa¬ 
tions can change. Selecting a micropro¬ 
cessor is often a long-term decision. 
Once a company adopts a particular 
processor for a product line, the inertia 
of the software that has been developed, 
expertise gained, and vendor relation¬ 
ships make it painful to switch architec¬ 
tures later on. Designers can remedy de¬ 
fects in today’s implementation in future 
chips, but they find defects in architec¬ 
ture much more difficult to correct. 

Architectures are, of course, abstract 
entities, and physical systems must be 
built with available implementations of 
that architecture. Thus, the characteris¬ 
tics of current implementations are im¬ 
portant. If, however, we select a proces¬ 
sor to form the basis of what is hoped to 
be a long-lived product line, we may 
willingly live with a suboptimal im¬ 
plementation if the architecture is 
sound. We might also have a good rea¬ 
son to think that future implementations 
will be better. For many designers, the 
question is not which chip seems best 
today. Rather, designers ask, “Which 
processor family will have the best chip 
(for my application) in three years, 
when my product goes into volume pro¬ 
duction. And, what will be available for 
the generation after that?” 

Designers should also ask, “How long 
will the recently established RISC archi¬ 
tectures evolve while maintaining binary 
compatibility?” For example, IBM’s 
second-generation RT will require re¬ 
compilation of existing programs. Some 
observers feel RISC architects inevita¬ 
bly will want to add more complex ad¬ 
dressing modes and other “CISC” func¬ 
tions to increase performance. This type 
of evolution will occur primarily in an 
upward-compatible fashion, but some 
designs may deviate from binary com¬ 
patibility to allow a significant perform¬ 
ance improvement. 
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Distinguishing architectures from implementations 


[With this issue, IEEE Micro wel¬ 
comes Michael Slater as he assumes 
responsibility for this column. The edi¬ 
tor and publisher of Microprocessor 
Report, an industry technical newsletter, 
Slater authored the textbook, Micropro¬ 
cessor-Based Design. He also organizes 
the annual Microprocessor Forum sym¬ 
posium, the technical conference for mi¬ 
croprocessor chip and system designers. 
He welcomes your comments and sug¬ 
gestions for the column. — Ed.] 


T he field of microprocessor-based 
design continues to evolve at a 
rapid pace, as it has for nearly 20 
years now. In the past few years a flurry 
of new microprocessor architectures, 
bus structures, interface standards, oper¬ 
ating systems, and design tools has 
emerged, producing many opportunities 
but also much confusion. 

Selecting architectures and design ap¬ 
proaches for microprocessor-based sys¬ 
tems is not an exact science. The trade¬ 
offs are complex, and the diversity of 
options available in commercial prod¬ 
ucts makes the decision task all the 
more difficult. Well-informed buyers 
talk to a variety of vendors and perform 
their own analysis of each product’s 
strengths and weaknesses for their appli¬ 
cation. Unfortunately, since each vendor 
tends to view the world in a way that is 
most favorable to its product, such re¬ 
search projects often produce as much 
confusion as clarity. 
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In this bimonthly column I plan to 
clarify some of the key issues in micro- 
processor-based hardware design. In this 
first column, I wrestle with the various 
meanings of architecture and implemen¬ 
tation, explore the implications of the 
distinction, and discuss its importance to 
the microprocessor selection process. 

What is architecture? 

Computer scientists use the term 
architecture in various ways. One us¬ 
age, which I will try to follow, refers to 
instruction set architecture. In essence, 
this usage includes those aspects of the 
microprocessor design that are visible to 
the software. Many completely different 
hardware designs may follow the same 
architecture, allowing them to run the 
same software. The differences among 
such designs are implementation differ¬ 
ences. 

The machine-language instruction set 
forms the core of a processor’s “person¬ 
ality.” Chip designers must take care 
with changes to the instruction set if 
architectural compatibility is to be main¬ 
tained. Designers maintain upward 
compatibility (the ability to execute old 
code on a new machine) by including an 
existing instruction set as a subset of a 
new instruction set and using previously 
undefined opcodes to add new 
instructions. Downward compatibility 
(the ability to run code for a new ma¬ 
chine on an older one) can be provided 
if older machines trap on undefined 
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opcodes. The trap handler on the old 
processor then emulates the new 
instructions. 

Implementations can vary greatly 
while retaining architectural compatibil¬ 
ity. The term microarchitecture refers to 
the detailed implementation, and has an 
entirely different meaning than architec¬ 
ture as used above. Architecture is of¬ 
ten used in either way, so we must be 
very careful in interpreting comments 
about it. In addition, different levels of 
architectural compatibility exist. 

Levels of visibility 

Software functions completely un¬ 
aware of many implementation details, 
except that the speed of execution may 
be affected. A processor with a 32-bit 
architecture may have only a 16-bit ad¬ 
der with microcode or other sequencing 
logic to perform a 32-bit add in two 
steps. For that matter, it could have a 
single-bit adder and perform the addi¬ 
tion in 32 steps. Similarly, the width of 
the data bus is not an architectural issue, 
as long as the hardware will provide the 
sequencing necessary for data transfers 
of any size defined by the architecture. 
Other common examples of architectur¬ 
ally irrelevant implementation details 
include the use of a barrel shifter to 
speed up multibit shift operations and 
the various algorithms that can be used 
to implement multiplication and division 
in hardware. 
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